3,230 Matching Annotations
  1. Aug 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer #1:

      Comment 1:

      Summary:

      The authors sought to investigate the associations of age at breast cancer onset with the incidence of myocardial infarction (MI) and heart failure (HF). They employed a secondary data analysis of the UK Biobank. They used descriptive and inferential analysis including Cox proportional hazards models to investigate the associations. Propensity score matching was also used. They found that Among participants with breast cancer, younger onset age was significantly associated with elevated risks of MI (HR=1.36, 95%CI: 1.19 to 1.56, P<0.001) and HF (HR=1.31, 95% CI: 1.18 to 1.46, P<0.001). the reported similar findings after propensity matching.

      Strengths:

      The use of a large dataset is a strength of the study as the study is well-powered to detect differences. Reporting both the unmatched and the propensity-matched estimates was also important for statistical inference.

      Weaknesses:

      Despite the merits of the paper, readers may get confused as to whether authors are referring to “age at breast cancer onset” or “age at breast cancer diagnosis”. I suppose the title refers to the latter, in which case it will be best to be consistent in using “age at breast cancer diagnosis” throughout the manuscripts. I would recommend a revision to the title to make it explicit that the authors are referring to “age at breast cancer diagnosis”.

      Thank you for your nice comments and suggestions. Yes, as you mentioned, in this study, we focused on age at breast cancer diagnosis, which was obtained from the cancer registry data in the UK Biobank and was used in all the analyses. We agree with you that it would be better to consistently use “age at diagnosis of breast cancer” throughout the manuscripts for a better understanding; therefore, we have replaced “age at breast cancer onset” with “age at diagnosis of breast cancer”.

      Change in the manuscript:

      “Age at breast cancer onset” was replaced with “age at diagnosis of breast cancer” in the title and throughout the manuscripts.

      Recommendations For The Authors:

      Kindly review the references for the location of the full stop. Putting the full stop at the end of the parenthesis makes reading smother than its current form as it is difficult to know when the new sentence begins.

      Thank you for your suggestion. We have made revisions to the location of the full stop next to a reference.

      Change in the manuscript:

      The full stop was put at the end of the parenthesis of a reference throughout the manuscripts.

      Response to Reviewer #2:

      Comment 1:

      This is a well-presented large analysis from the UK Biobank of nearly 250,000 female adults. The authors examined the associations of breast cancer diagnosis with incident myocardial infarction and heart failure by different onset age groups. Based on results from a series of statistical analyses, the authors concluded that younger onset age of breast cancer was associated with myocardial infarction and heart failure, highlighting the necessity of careful monitoring of cardiovascular status in women diagnosed with breast cancer, especially those younger ones.

      Comments to consider:

      It’s thoughtful for the authors to have included and adjusted for menopausal status, breast cancer surgery, and hormone replacement therapy in their sensitivity analysis. It would be informative if the authors presented the number and percentages of menopause and cancer treatments.

      Thank you for your comments. As suggested, we have provided more detailed information on the number and percentage of menopausal status and breast cancer treatments.

      Change in the manuscript:

      Page 11, Lines 208 to 211: added “Among participants with breast cancer, 11 460 (70.6%) participants were postmenopausal, 14 255 (87.6%) participants had undergone breast cancer surgery, and 6 784 (41.8%) participants had received hormone replacement therapy.”

      Change in the supplementary material:

      The number and percentage of menopausal status, breast cancer surgery, and hormone replacement therapy were added to Table S13.

      aAdjusted for age, ethnicity, education, current smoking, current drinking, obesity, exercise, low-density lipoprotein cholesterol, depressed mood, hypertension, diabetes, antihypertensive drug use, antidiabetic drug use, statin use, menopausal status, breast cancer surgery, and hormone replacement therapy.

      HR, hazard ratio; CI, confidence interval.

      Comment 2:

      The analytical baseline used for follow-up should be pointed out in the methods section. It’s confusing whether the analytic baseline was defined as the study baseline or the time at breast cancer diagnosis.

      We apologize for the confusion. In this study, the analytical baseline used for follow-up was defined as the baseline of UK Biobank (2006-2010) and we have pointed it out in the methods section as suggested.

      Change in the manuscript:

      Page 9, Lines 165 to 166: added: “The analytical baseline used for follow-up was defined as the baseline of UK Biobank (2006-2010).”

      Comment 3:

      Did the older onset age group have a longer follow-up duration? Could the authors provide information on the length of follow-up by age of onset in Supplementary Table S4? It would give the readers more information regarding different age groups.

      Thank you for your question. We compared the time of follow-up among the three diagnosis age groups and found that although the durations of follow-up among the three groups were quite similar (as shown in Table S4), statistical analysis revealed a significant difference with the older diagnosis age group demonstrating a longer follow-up duration (P for Kruskal-Wallis test <0.001). This is understandable as with large sample sizes, even a slight difference could lead to statistical significance. According to your suggestion, we have added information on the length of follow-up by age of diagnosis in Supplementary Table S4.

      Change in the supplementary material:

      Added the median and interquartile range of follow-up in Supplementary Table S4.

      The results are presented as the mean ± standard deviation, or No. (%).

      aThe effect sizes are standardized mean differences for continuous outcomes and the Phi coefficient for dichotomous outcomes.

      LDL-C, low-density lipoprotein cholesterol.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study addresses the temporal patterning of a specific Drosophila CNS neuroblast lineage, focusing on its larval development. They find that a temporal cascade, involving the Imp and Syb genes changes the fate of one daughter cell/branch, from glioblast (GB) to programmed cell death (PCD), as well as gates the decommissioning of the NB at the end of neurogenesis.

      I believe there are some inaccuracies in this summary. We address temporal patterning during larval and pupal stages until the adult stage. The Imp and Syp genes change the fate of one daughter cell/branch from survival to programmed cell death (PCD). The change from glioblast (GB) to PCD, which occurs at an early time point, is not addressed here. The main point of the paper is missing:

      • Last-born MNs undergo apoptosis due to their failure to express a functional TF code, and this code is post-transcriptionally regulated by the opposite expression of Imp and Syp in immature MNs.

      Reviewer #2 (Public Review):

      Summary:

      Guan and colleagues address the question of how a single neuroblast produces a defined number of progeny, and what influences its decommissioning. The focus of the experiments are two well-studied RNA-binding proteins: Imp and Syp. The Authors find that these factors play an important role in determining the number of neurons in their preferred model system of VNC motor neurons coming from a single lineage (LinA/15) by separate functions taking place at specific stages of development of this lineage: influencing the life-span of the LinA neuroblast to control its timely decommissioning and functioning in the Late-born post-mitotic neurons to influence cell death after the appropriate number of progeny is generated. The post-mitotic role of Imp/Syp in regulating programmed-cell death (PCD) is also correlated with a specific code of key transcription factors that are suspected to influence neuronal identity, linking the fate of neuronal survival with its specification. This paper addresses a wide scope of phenotypes related to the same factors, thus providing an intriguing demonstration of how the nervous system is constructed by context-specific changes in key developmental regulators.

      The bulk of conclusions drawn by the authors are supported by careful experimental evidence, and the findings are a useful addition to an important topic in developmental neuroscience.

      I cannot summarize better the paper.

      Strengths:

      A major strength is the use of a genetic labeling tool that allows the authors to specifically analyze and manipulate one neuronal lineage. This allows for simultaneous study of both the progenitors and post-mitotic progeny. As a result the paper conveys a lot of useful information for this particular neuronal lineage. Furthermore addressing the association of cell fate specification, taking advantage of this lab's extensive prior work in the system, with developmentally-regulated programmed celldeath is an important contribution to the field.

      Beyond Imp/Syp, additional characterization of this model system is provided in characterizing a previously unrecognized death of a hemilineage in early-born neurons.

      Thanks!

      Weaknesses:

      The main observations that distinguish this study from others that have investigated Imp/Syp in the fly nervous system is the role played in late-born post-mitotic neurons to regulate programmed cell death. This is an important and plausible (based on the presented findings) newly discovered role for these proteins. However the precision of experiments is not particularly strong, which limits the authors claims. The genetic strategy used to manipulate Imp/Syp or the TF code appears to be done throughout the entire lineage, or all neuronal progeny, and not restricted to only the late born cells. Can the authors rule out survival of the early born hemi-lineage normally fated to die? Therefore statements such as this: 

      To further investigate this possibility, we used the MARCM technique to change the TF code of lastborn MNs without affecting the expression of Imp and Syp should be qualified to specify that the result is obtained by misexpressing these factors throughout the entire lineage.

      We agree that our genetic manipulations affect the entire lineage or all neuronal progeny. We do not have genetic tools to gain such precision. We have changed our descriptions to specify the entire lineage or all neuronal progeny. As the reviewer raised, we were also concerned about the possibility that the overexpression of Imp or knockdown of Syp could induce the survival of the early-born hemilineage. We have two experiments that rule out this possibility:

      (1) In late LL3 larvae, Imp OE or syp MARCM clones do not change the number of cells in LL3 larvae (see Guan et al., 2022), indicating that the hemilineage that died by PCD is not affected. If Imp or Syp played a role in the survival of the hemilineage, we would see at least a 50% increase in the number of MNs at this stage.

      (2) The MARCM experiment using the VGlut driver to overexpress P35 or Imp allows us to manipulate only elav+ VGlut+ neurons. The hemilineage removed by PCD is elav- VGlut- and is not affected by this experiment. Consequently, the increase in MNs in adults with genetic manipulation can only be the result of the survival of the other hemilineage (elav+, VGlut+). Moreover, this experiment shows an increase in the number of neurons in the adult but not in LL3, demonstrating that the hemilineage (elav- VGlut-) is still removed by PCD with this genetic manipulation.

      The authors make an observation that differs from other systems in which Imp/Syp have been studied: that the expression of the two proteins appears to be independent and not influenced by cross-regulation. However there is a lack of investigation as to what effect this may have on how Imp/Syp regulate temporal identity. A key implication of the previously observed cross-regulation in the fly mushroom body is that the ratio of Imp/Syp could change over the life of the NB which would permit different neuronal identities. Without cross-regulation, do the authors still observe a gradient in the expression pattern of time? Because the data is presented with Imp and Syp stained in different brain samples, and without quantification across different stages, this is unclear. The authors use the term 'gradient' but changes in levels of these factors are not evident from the presented data.

      We have now quantified the transcriptional activity of Imp and Syp in the NB over time using smFISH. We have also quantified the relative expression of Imp and Syp protein in the NB over time by co-immunostaining. Additionally, we quantified the relative expression of Imp and Syp protein in postmitotic neurons as a function of their birth order in late LL3 larvae. All these data show an opposite temporal gradient of Imp and Syp in the NB and an opposite spatial gradient in immature neurons according to their birth order (Figure. 4). How these gradients are established in our system remains to be elucidated. 

      Reviewer #3 (Public Review):

      This study by Guan and co-workers focuses on a model neuronal lineage in the developing Drosophila nervous system, revealing interesting aspects about: a) the generation of supernumerary cells, later destined for apoptosis; and, b) new insights into the mechanisms that regulate this process. The two RNA-binding proteins, Imp and Syp, are shown to be expressed in temporally largely complementary patterns, their expression defining early vs later born neurons in this lineage, and thus also regulating the apoptotic elimination. Moreover, neuronal 'fate' transcription factors that are downstream of Imp and signatures of early-born neurons, can also be sufficient to convert later born cells to an earlier 'fate', including survival.

      The authors provide solid evidence for most of their statements, including the temporal windows during which the early and the later-born motoneurons are generated by this model lineage, how this relates to patterns of cell death by apoptosis and that mis-expression of early-born transcription factors in later-born cells can be sufficient to block apoptosis (part of, and perhaps indicative of the late-born identity).

      Other studies have previously outlined analogous, mutually antagonistic roles for Imp and Syp during nervous system development in Drosophila, in different parts and at different stages, with which the working model of this study aligns.

      Overall, this study adds to and extends current working models and evidence on the developmental mechanisms that underlie temporal cell fate decisions.

      I cannot summarize better the paper.

      Reviewer #1 (Recommendations For The Authors):

      While this is an interesting topic, I raised two issues in my original review.

      (1) Against the backdrop of numerous previous studies linking many developmental regulators, including tTFs, to programmed cell death in the developing CNS, which in several cases have involved identifying key PCD genes and decoding the molecular regulatory interplay between regulators and PCD genes, this study does not provide any new insight into the regulation of developmental PCD in the CNS.

      The authors have not added any new data to address this shortcoming.

      I agree with the reviewer that we did not attempt to link Imp/Syp with the temporal transcription factor (tTF) cascade or spatial selectors such as Hox genes. However, this decision was intentional as our primary focus was on studying immature MNs. It is worth noting that the decommissioning of NBs by autophagic cell death or terminal differentiation, which is mediated by Imp/Syp in other lineages, has not been correlated with tTFs or spatial selectors. Although we have not directly examined the involvement of the hb + sv > kr > pdm > cas > cas-svp > Grh cascade in the decommissioning of the Lin A neuroblast, our preliminary data indicate that Hb, Sv, Pdm, and Cas are not expressed in the Lin A NB, while Grh is consistently expressed in the NB (Wenyue et al., 2022). Thus, it is less likely that this particular tTF cascade is not implicated in Lin A neuroblast decommissioning. In contrast, spatial selectors, such as the Hox gene Antp, play an opposing role compared to HOX transcription factors in abdominal NBs. In the Lin A lineage, Antp promotes survival (Baek, Enriquez, & Mann, 2013). Here, to avoid repeating what has already been described in the literature, we focused on the role of Imp/Syp in postmitotic neurons and revealed that the precise elimination of MNs is linked to the control of TFs expressed in the MNs.

      (2) I raised the issue that it is unclear if Imp/Syp acts in the NB, and/or in IMC/GMC, and/or in the daughter cells generated from these.

      I agree with the reviewer's concern regarding the unclear function of Imp/Syp, i.e., whether it acts in the NB, IMC/GMC, or daughter cells. To address this, one possible approach would be to attempt rescuing Imp and Syp mutants by transgenic expression in specific cell types, such as NBs, IMC/GMC, or GB/daughter cells. However, we have not conducted such experiments as we were skeptical about the outcome. Previous published work has used drivers expressed in NBs, IMC/GMC, or postmitotic neurons to decipher the function of a gene in a specific cell type. But the results of these experiments must be taken with caution. Using NB/GMC drivers to study gene function can lead to effects not only in the NB but also in its progeny, including GMC or postmitotic neurons, due to the perdurance and stability of the Gal4 and UAS-gene expression system. For instance, dpn-Gal4 UASGFP not only labels the NB but also many of its progeny, even if Dpn is only expressed in NBs. And elav-Gal4 is expressed in the NB and GMCs.

      However, our overexpression of Imp in immature neurons using Vglut demonstrates that Imp promotes cell survival through an autonomous function in these neurons. This driver is only expressed in postmitotic neurons (elav+) and not in the NB, IMC/GMC, or in the hemilineage eliminated by cell death (elav-vglut-).

      Reviewer #2 (Recommendations For The Authors):

      Oddly knockdown of Imp in the neuroblast (Fig. 5D) only led to death at 8h APF, when Imp is no longer expressed. Do the authors have an explanation as to how the stem cell can survive until this point? A discussion would be helpful.

      The simple explanation is the efficiency of RNAi. The imp-/- MARCM clones (Guan et al., 2022) lead to a stronger reduction of MNs in LL3.

      A simple experiment I would recommend is to repeat the antibody stainings of staged larvae/pupae (Fig. 4) having the anti-Imp/Syp antibodies in the same brain sample, and perhaps a quantification of the ratio in the NB. Given the species in which the ABs were raised seem compatible, this should be feasible. As it stands now, there is no indication of whether the ratio of Imp vs Syp change over time.

      We have now quantified the transcriptional activity of Imp and Syp in the NB over time. We have also quantified the relative expression of Imp and Syp proteins in the NB over time and quantified the relative expression of Imp and Syp proteins in postmitotic neurons as a function of their birth in late LL3 larvae. How these gradients are established in our system still remains to be 

      Minor errors/suggestions:

      Fig 4. Time legend at the top goes A, B, C, E, F (no D). So it doesn't match the panels below

      Yes, we have made the corrections.

      Sentence repeated in Intro:

      The process of terminating NB neurogenesis through autophagic cell death or terminal differentiation is commonly referred to as decommissioning.

      Yes, corrections have been made.

      IN FIGURE 1 THEY SAY 'TYPE IB' AND IN FIGURE 2 THEY SAY 'TYPE 1B'

      We have changed it to type 1b.

      In Fig2A-It's hard to see lack of Elav and Fig2G-It's hard to see presence of Dcp1. Panels could be adjusted to emphasize these results

      We have increased the size of the panels and made two separate panels where only the elav and Dcp1 signals are present.

      Observations that the result is equivalent in all thoracic segments is expected, since all legs need the same number of neurons. This is nice to have but can be in the supplement.

      Overall the figure number seems excessive, especially considering much of the results included(particularly the NB results) are findings consistent with previous papers and some is characterization of the system that does not fit well with the main focus regarding Imp/Syp (i.e death of one hemi-lineage:

      Figure 5 and 6 can be joined as one.

      We have combined Figures 5 and 6, showing only the T1 segments.

      There is some discrepancy between graphs Fig7F and K: At LL3 the number of neurons is different for the control in 7F and the count in K

      Yes, because the genetic backgrounds are not the same and we are not counting the same type of cells. In 7F, we are counting the elav+ and VGlut+ cells, whereas in Figure 7K, we are counting all the elav+ in Lin A, including those elav+ VGlut-. VGlut expression arrives a bit later after elav+, which is why we have fewer elav+ cells in 7F. In other words, VGlut MARCM clones do not label all Lin A elav+ cells. I have clarified this in the figure.

      Reviewer #3 (Recommendations For The Authors):

      Main comment: on the notion of Imp and Syp gradients:

      p. 5, related to figure 4 - there are clearly distinct windows for predominantly (if not exclusively) Imp, and later, Syp expression in lineage 15, with a phase of co-expression.

      However, based on the data shown, it is unclear whether these windows represent gradients, as repeatedly stated. If the notion of gradients is derived from other studies, on other lineages, then this would be good to clarify. Alternatively, the idea of temporally opposing gradients of Imp and Syp would need to be demonstrated for this lineage.

      For example, a more accurate way to describe this study's data is given on p.7 "In conclusion, our findings demonstrate that the opposite expression pattern of Imp and Syp in postmitotic neurons precisely shapes the size of Lin A/15 lineage by controlling the pattern of PCD in immature MNs (Fig. 8)."

      We have now quantified the transcriptional activity of Imp and Syp in the NB over time. We have also quantified the relative expression of Imp and Syp proteins in the NB over time. We have also quantified the relative expression of Imp and Syp proteins in postmitotic neurons as a function of their birth in late LL3 larvae. How these gradients are established in our system still remains to be identified.

      Minor points:

      p.6, related to figure 7: Are numbers of EDU- early born and EDU+, late born, MNs expressed as means in the main text? As written, it suggests absence of any variability, which one would expect and which is shown in Fig.7 data.

      Yes, we have added averages in the text.

      Methods: the author name 'Lacin' has been mis-spelled

      Sorry about that, it's been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper proposes a valuable new method for the assessment of the mean kurtosis for diffusional kurtosis imaging by utilizing a recently introduced sub-diffusion model. The evidence supporting the claims that this technique is robust and accurate in brain imaging is incomplete. The work could be of interest in the research and clinical arena.

      We thank the editors for their assessment and the reviewers for their careful reading and feedback that helped to improve the manuscript. We have addressed all the reviewers’ concerns and would like to request an update of the assessment to reflect the revisions we have made.

      Below, we address the reviewers’ comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study introduces an innovative method for assessing the mean kurtosis, utilizing the mathematical foundation of the sub-diffusion framework. In particular, a new fitting technique that incorporates two different diffusion times is proposed to estimate the parameters of the sub-diffusion model. The evaluation of this technique, which generates kurtosis maps based on the sub-diffusion framework, is conducted through simulations and the examination of data obtained from human subjects.

      We thank Reviewer #1 for pointing out the novelty and innovation of our work.

      Strengths:

      The utilization of the sub-diffusion model for tissue characterization is a significant conceptual advancement for the field of diffusion MRI. This study adeptly harnesses this approach for an accurate estimation of the parameters of the widely employed diffusion model, DKI, leveraging their established analytical interconnection as evidenced in prior research. Notably, this approach not only proposes a robust, fast, and accurate technique for DKI parameter estimation but also underscores the viability of deploying the sub-diffusion model for tissue characterization, substantiated by both simulated and human subject analyses. The paper is very-well written; well-organized; and coherent. The simulation study included different aspects of water diffusion as captured by diffusion-weighted MRI such as varying diffusion times and different b-value subpopulations, resulting in a comprehensive and thorough discussion.

      We thank Reviewer #1 for highlighting the the strengths of our work.

      Weaknesses:

      The primary objective of this study is to demonstrate a robust approach for estimating DKI parameters by directly calculating them using the parameters of the sub-diffusion model. This premise, however, relies on the assumption that the sub-diffusion model effectively characterizes the diffusion MRI signal and that its parameters are both robust and accurate. Throughout the manuscript, the term "ground truth kurtosis K" is frequently used to denote the "true K" value in the context of the simulation study. Nonetheless, given that the data is simulated using the new sub-diffusion model - an approximation of the DKI-based signal expression- this value cannot truly be considered the "ground truth K". The simulation study highlights the robustness and accuracy of D* and K*, but it inherently operates under the assumption that the observed data is in the form of the sub-diffusion model.

      It is correct that our study operates under the assumption that the observed data is in the form of the sub-diffusion model, and indeed one of the key outcomes of this work is to demonstrate the effectiveness of that assumption and the new possibilities it brings. Naturally, using any mathematical model at all carries assumptions. Over the past two decades, many mathematical and biophysical models have been proposed to characterise diffusion MRI signals. However, model validation remains an open challenge in the field. In this, as well as in our previous work (Yang et al, NeuroImage, 2022), we have shown that our proposed sub-diffusion model not only provides a much better fitting compared to the traditional DKI method, overcoming the major limitation of the traditional DKI method on the maximum b-value, but also generates brain maps with superior tissue contrast and elucidates previously unseen structure.

      We have replaced the term “ground truth kurtosis K” with “true kurtosis K”.

      The comment “… using the new sub-diffusion model – an approximation of the DKI-based signal expression…” is a bit misleading. In fact we propose that the reverse interpretation is the more suitable way to view the relationship: the DKI model is a degree-2 approximation of the sub-diffusion model, as in eq. (7).

      Reviewer #2 (Public Review):

      Summary: The authors present a technique for fitting diffusion magnetic resonance images (dMRI) to a sub-diffusion model of the diffusion process within brain imaging. The authors suggest that their technique provides robust and accurate calculation of diffusional kurtosis imaging parameters from which high quality images can be calculated from short dMRI data acquisitions at two diffusion times.

      Strengths: If the authors can show that the dMRI signal in brain tissue follows a sub-diffusion model decay curve then their technique for accurately and robustly calculating diffusional kurtosis parameters from multiple diffusion times would be of benefit for tissue microstructural imaging in research and clinical arenas.

      In Figure 7, we showed that the diffusion MRI signals follow the sub-diffusion model decay curves.

      Weaknesses: The applied sub-diffusion model has two parameters that are invariant to diffusion time, D_β and β which are used to calculate the diffusional kurtosis measures of a diffusion time dependent D* and a diffusion time invariant K*. However, the authors do not demonstrate that the D_β, β and K* parameters are invariant to diffusion time in brain tissue.

      In our proposed sub-diffusion model, D_β and β are assumed to be time-independent parameters, which is a key strength of the approach. The goal is to characterise tissue-specific properties (D_β for diffusivity and β for the extent of tissue complexity) that do not rely on the diffusion time setting in diffusion MRI experiments. To extract such time-independent properties, we proposed a new sampling and fitting strategy – fitting at least two diffusion time data together.

      The authors' results visually show that there is time dependence of the K* measure (in Figure 6) that is more apparent in white matter with K* values being higher for diffusion times of ∆=49 ms than ∆ = 19 ms. The diffusion time dependence of K* indicates there is also diffusion time dependence of β.

      The discrepancies in the fitted K* for ∆ = 19 ms and ∆ = 49 ms separately do not necessarily imply that there is a true time dependence in these parameters. Rather, this can be explained by a deficiency of data when fitting a two-dimensional surface (S is a function of q and ∆) based on data along a single curve for a fixed value of ∆.  Without properly sampling the surface across two independent coordinates, one cannot expect a fully reliable fit.  Indeed, a great advantage of our proposed method is to allow fitting data with multiple values of ∆, and thereby getting a richer data set with which to fit the full signal surface S(q, ∆).  The results for fitting ∆ = 19 ms and ∆= 49 ms data together clearly show the benefits of this approach, with superior contrast achieved.

      Furthermore, Figure 7 shows that there is a tissue specific root mean squared error in model fitting over the two diffusion times which indicates greater deviation from the model fit in white matter than grey matter.

      Although the errors are not completely tissue-independent, please note the magnitude of the RMSE is very small. The quality of the fitting in both white and grey matter is shown in sub-figures (A)-(H) for several representative voxels.

      To show that the sub-diffusion model is robust and accurate (and consequently that K* is robust and accurate) the authors would have to demonstrate that there is no diffusion time-dependence in both D_β and β in application to brain imaging data for each diffusion time separately. Simulated data should not be used to demonstrate the robustness and accuracy of the sub-diffusion model or to determine optimization of dMRI acquisition parameters without first demonstrating that D_β and β are invariant to diffusion time. This is because simulated signals calculated by using the sub-diffusion characteristic equation of dMRI signal decay will necessarily have diffusion time invariant D_β and β parameters. Without further information demonstrating diffusion time invariance of D_β, β and K* it is not possible to determine whether the authors have achieved their aims or that their results support their conclusions.

      First, as explained above, the dMRI signal S is a function of q and ∆, i.e., a two-dimensional surface S(q, ∆), and hence fitting data sampled from single diffusion time (i.e., one curve on the surface) cannot provide reliable parameters, as seen in the discrepancies in K* in Figure 6 (bottom two rows). Our proposed new sampling and fitting strategy overcomes this issue. That is, to obtain a reliable fitting, one should fit data from at least two diffusion times together (i.e., sampling data from at least two curves on the signal surface).

      Second, to demonstrate that D_β and β are time invariant, one would require data at several diffusion times with high b values. Such data cannot be easily obtained. The data used in this current study is the MGH Connectome 1.0 human brain data, which only contains two diffusion times, ∆ = 19 ms and ∆ = 49 ms.

      Hence, we conducted numerical experiments to demonstrate our idea. In Figure 3, we showed that (i) the variability of the fitted parameters is significantly reduced when moving from fitting single diffusion time data to two diffusion time data, and (ii) the difference in fitting three diffusion times compared to two is very minor, indicating convergence towards the correct time-independent parameter values. The results from fitting human brain data (Figure 6 and Tables 2-4) agree with the expectations from our numerical experiments. Hence, we believe that we have provided sufficient evidence to support our proposed sub-diffusion model and its optimal fitting strategy.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It is clear that the authors preferred generating the data by using sub-diffusion model's signal expression as it has many benefits, such as allowing different diffusion times to be incorporated, and hence investigation of the effect of the number of diffusion times on the accuracy of the parameter fitting. I recommend adding another simulation study by generating the data with the DKI model expression (as the goal of the study is to provide an accurate mapping of diffusional mean kurtosis), fitting the data to the sub-diffusion model's expression in Eq. (10), and then calculating K* and D* by Eqs. (8) and (9) only for a fixed diffusion time and one b-value subset.

      We appreciate the suggestion. However, unfortunately it is not appropriate to generate data with the DKI model, as the maximum b-value is limited to 2000~3000s/mm^2 and hence the DKI model cannot represent diffusion MRI signals from a full spectrum of b-values. A key strength of our proposed model is that it removes this limitation.

      There is a typo on Page 24, Line 581; "b<=2400" should be b>=2400.

      We have fixed this typo.

      Reviewer #2 (Recommendations For The Authors):

      As the authors state the sub-diffusion model has two parameters, D_β and β that are invariant to diffusion time, and give rise to a time-varying diffusion coefficient in mm^2s^-1 and a time invariant kurtosis. However, there is a need to be clearer and more specific about the implications of the sub-diffusion model. The manuscript would be improved by the authors:

      (a) Defining the time-varying diffusion coefficient that arises from the model, its functional form and properties.

      We refer Reviewer#2 to eq.(5) and eq.(8) for the definition of time-varying diffusion coefficients D* and D_SUB and their relationship.

      (b) Clearly discuss the implications of this with respect to other time-varying diffusion coefficient methods in the current literature.

      We refer Reviewer#2 to the section “Time-dependence of diffusivity and kurtosis” under “Discussions”.

      (c) Demonstrating that D_β and β do not vary with diffusion time when estimated from dMRI acquired on human participants.

      We have addressed this comment in the public review.

      The manuscript would benefit from increases in clarity in all sections and the authors identifying typographical errors.

      We have updated the relevant text in the revised manuscript to make it clearer, including fixing typos.

      Specific improvements to clarity in the methods and results section would include:

      Line 620: Why were parameter approximations for model fitting to simulated data restricted to the ranges D_β∈[10^(-4),10^(-3) ] and β∈[0.5,1] but in fitting to brain imaging data the ranges were D_β>0 and 0<β<=1.

      The parameter ranges for model fitting to both the simulated and human data were set to the same: D_β>0 and 0<β<=1. To generate simulated data, D_β and β ranges were restricted to reflect observations in human brain data. We have updated the text to make this clearer.

      Lines 622, 628 & 629: Which goodness of fit measure was used?

      The goodness of fit measure for all simulated results is the coefficient of determination, or R^2 value, as noted in the “Goodness-of-fit and region-based statistical analysis” section under Methods. We have updated the text to make this clearer.

      Line 666: The method for computation of R^2 within the coefficient of determination should be stated as there are several ways of calculating an R^2 value.

      The formula for computing R^2 has been added to the text.

      Line 685: A t-test is mentioned but it is not clear as to the inputs to this test, or where the results of this analysis are presented.

      We have updated the text to make this clearer. The results of this analysis are presented in Table 5. The entries identified in italic under the optimal b-value heading were found to be significantly different from the benchmark mean K* reported in Table 2.

      Line 696: It is not clear how the intra-class correlation coefficient histograms are computed from six subjects. This applies to results in Figure 10 that require greater clarity in the description.

      The formula for computing the intra-class correlation coefficient has been added to the sub-section “Scan-rescan analysis using intraclass correlation coefficient (ICC)” under “Methods”.

      It would be helpful if the authors primarily report results pertaining to the model parameters D_β and β. This is because D* and K* are calculated from D_β and β. Conditions for robust and accurate estimation of D_β and β will provide robust and accurate measures for D* and K*.

      Two new tables for the model parameters D_β and β have been added. Please see Tables 3 and 4 in the revised manuscript.

      The authors state that fitted model parameters are not affected by maximum b-value (paragraph beginning line 366). This statement is based on their model simulation results. Could the authors provide data to support this based on the application of their model to the human brain imaging data?

      We would like to clarify that our statement is indeed based on human brain imaging. As stated in the paragraph beginning line 366, both results in Table 2 (using full dataset) and Table 5 (using dataset with optimal b-value sampling) are generated from the Connectome human brain data. If maximum b-value dependence is present, benchmark (Table 2) versus optimal region-specific results (Table 5, or previously Table 3) should show some systematic difference.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the role of chirping in a species of weakly electric fish. They subject the fish to various scenarios and correlate the production of chirps with many different factors. They find major correlations between the background beat signals (continuously present during any social interactions) or some aspects of social and environmental conditions with the propensity to produce different types of chirps. By analyzing more specifically different aspects of these correlations they conclude that chirping patterns are related to navigation purposes and the need to localize the source of the beat signal (i.e. the location of the conspecific).

      The study provides a wealth of interesting observations of behavior and much of this data constitutes a useful dataset to document the patterns of social interactions in these fish. Some data, in particular the high propensity to chirp in cluttered environments, raises interesting questions. Their main hypothesis is a useful addition to the debate on the function of these chirps and is worth considering and exploring further.

      After the initial reviewers' comments, the authors performed a welcome revision of the way the results are presented. Overall the study has been improved by the revision. However, one piece of new data is perplexing to me. The new Figure 7 presents the results of a model analysis of the strength of the EI caused by a second fish to localize when the focal fish is chirping. From my understanding of this type of model, EOD frequency is not a parameter in the model since it evaluates the strength of the field at a given point in time. Therefore the only thing that matters is the phase relationship and strength of the EOD. Assuming that the second fish's EOD is kept constant and the phases relationship is also the same, the only difference during a chirp that could affect the result of the calculation is the potential decrease in EOD amplitude during the chirp. It is indeed logical that if the focal fish decreased its EOD amplitude the target fish's EOD becomes relatively stronger. Where things are harder to understand is why the different types of chirps (e.g. type 1 vs type 2) lead to the same increase in signal even though they are typically associated with different levels of amplitude modulations. Also, it is hard to imagine that a type 2 chirps that is barely associated with any decrease in EOD amplitude (0-10% maybe), would cause doubling of the EI strength. There might be something I don't understand but the authors should provide a lot more details on how this result is obtained and convince us that it makes sense.

      We thank the author for the comments and we agree that the approach could have been better detailed. As anticipated by the Reviewer, the Boundary Element Method (BEM) model can be used simply to calculate the electric field and electric image at a specific point in time (instantaneously), regardless of EOD frequency. However, our model allows for the concatenation of consecutive instants and thus is able to render an entire sequence of electric fields - and resulting electric images - incorporating realistic EOD characteristics such as shape, duration, and frequencies (see Pedraja et al., 2014).

      Chirp-triggered EIs were modeled using real chirps produced by interacting fish. Each chirp was thus associated to its duration and peak parameters, as well as the fish positional information (distance and angle). 

      However, since we did not know the beat phase at which chirps were produced, we computed electric images for each fish position and chirp scenario by simulating various phases (here referred to the initial offset of the two EODs, set at 4 phases, equally spaced). These are intended as phases of the sender EOD and simply refer to the initial OFFSET between the two interacting EODs. However, since our simulations were run over a time window of 500 msec, all phases are likely to be covered, with a different temporal order relative to the chirp (always centered within the 500 msec).

      The simulation was run maintaining consistent timing for both chirp and non-chirp conditions, across approximately 800 body nodes. At each node, the current flow was calculated from the peak-to-peak of the EOD sum (i.e. the point-to-point of the difference between the beat positive and negative envelopes). Analyzing the EIs over this fixed time window enables us to assess the unitary changes of current flow induced by chirps over units of time (ΔI/Δt). From this, we can calculate a cumulative sum of current flow changes - expressed as delta(EI) and use it to show the effect of the chirps on the spatiotemporal EI (Figure 7C).

      One can express this cumulative change mapped onto the fish body (keeping the 800 points separated, as in Figure 7C) or further sum the current changes to obtain a single total (as shown in Figure 7D).

      One can check this by considering that a sum for example of a set of 500/800 points - judging from the size of the blue areas in C not all 800 points have a detectable change - each valued 0.1-to-0.3 mA/s, one could get circa 100 mA/s, which is what is shown in D. (is this what is happening ?)

      We do not know why chirps of different types triggered similar effects. It is possible that, since EI measurements are pooled over several chirps produced at different angles and distances, in case of a lower amount of chirps considered for a given type (as in the case of rises, very low) these measurements may not highlight more marked differences among types. In a publication we are currently working on, we are considering a larger dataset to better assess these results.

      The methods section has been edited to clarify the approach (not yet).

      Reviewer #2 (Public Review):

      Studying Apteronotus leptorhynchus (the weakly electric brown ghost knifefish), the authors provide evidence that 'chirps' (brief modulations in the frequency and amplitude of the ongoing electric signal) function in active sensing (specifically homeoactive sensing) rather than communication. Chirping is a behavior that has been well studied, including numerous studies on the sensory coding of chirps and the neural mechanisms for chirp generation.

      Chirps are largely thought to function in communication behavior, so this alternative function is a very exciting possibility that could have a great impact on the field.

      We thank the Reviewer for the extensive and constructive comments. We would like to add that, while it is true that many detailed studies have been published on the anatomy and physiology of the circuits implicated in the production and modulation of “electric chirps”, most of this  research assumed, and focused exclusively on, their possible role in communication.  In addition, most behavioral studies did the same and a meta-analysis of the existing literature on chirping allows to trace back the communication idea mainly to two studies: Hagedorn and Heiligenberg, 1985 (“Court and spark: electric signals in the courtship and mating of gymnotoid fish”) and Hopkins, 1974 (“Electric Communication: Functions in the Social Behavior of Eigenmannia Virescens”), among the main sources. Importantly, in these studies only contextual observations have been made (no playback experiment or other attempts to analyze more quantitatively the correlation of chirping with other behaviors).

      The authors do provide convincing evidence that chirps may function in homeoactive sensing. However, their evidence arguing against a role for chirps in communication is not as strong, and fails to sufficiently consider the evidence from a large body of existing research. Ultimately, the manuscript presents very interesting data that is sure to stimulate discussion and follow-up studies, but it suffers from dismissing evidence in support of, or consistent with, a communicative function for chirps.

      Although the tone of some statements present in our earlier draft may suggest otherwise, through our revisions, we have made an effort to clarify that we do not intend to dismiss a function of chirps in communication, we only intend to debate and discuss valid alternative hypothesis, advanced from reasonable considerations.

      Before writing this manuscript, we have attempted to survey  literally all the existing literature on chirps (including studies focused on behavior, peripheral sensory physiology as well as brain physiology). Although it is not unlikely that some studies have eluded our attention, an effort for a comprehensive review was made. Based on this survey we realized that none of the studies provided a clear  and  unambiguous piece of evidence to support the communication hypothesis (we refer here to the weak points highlighted in the discussion and mentioned in the previous comment). Which in fact does not come without its weak points and contradictions (see later comments).

      It follows a summary of the mentions made to the communication theory in the different section of the manuscript including several edits we have applied in response to the Reviewer’s concern:

      In the abstract we clearly state that we are considering an alternative that is only hypothetically complementary, not for sure.  Nonetheless, we have identified a couple of instances that could sound dismissive of the “communication hypothesis” in the following section.

      In the introduction we write in fact about the possibility of interference between communication signals and conspecific electrolocation cues, as they are both detected as beat perturbations. We did not mean to use “Interference” here as “reciprocal canceling”, rather we intended it as “partial or more or less conspicuous overlap” in the responses triggered in electroreceptors.

      Hoping to convey a clearer message, we have edited the related statement and changed it to “both types of information are likely to overlap and interact in highly variable ways”.

      We have also removed the statement: “According to this idea, beats and chirps are not only detected through the same input channel, but also used for the same purpose.” as at this point in the manuscript it may be too strong.

      In the results section we do not include statements that might be seen as dismissive of the communication hypothesis but only statements in support of the “probing with chirps” idea (which is the central hypothesis of the study).

      In the discussion paragraphs we elaborate on why the current functional view is either flawed or incomplete (first paragraph “existing functional hypotheses''). Namely: 1)  multiple triggering factors implied in chirp responses covary and need to be disentangled (example DF/ sex), 2) findings on brown ghosts and a few other gymnotiforms have been used to advance the hypothesis of “communication through chirps'' in all weakly electric fish (including pulse species). 3) social encounters - in which chirps are recorded - imply also other behaviors (such as probing) which have not been considered so far. This point is related to the first one on covariates. 4) most studies referring to big chirps as courtship chirps were not done in reproductive animals (added now)  and 5) no causal evidence has been provided so far to justify a role of chirps in social communication.

      We are discussing these points as challenges to the communication hypothesis, not to dismiss the hypothesis, but rather to motivate future studies addressing these challenges.

      We do not want to appear dismissive of the communication hypothesis and had therefore previously edited the manuscript to avoid the impression of exclusivity of the probing hypothesis. We have now gone over the manuscript once more and edited several sentences. Nevertheless, we want to point out again that - despite the large consensus - the communication hypothesis has, until now, never been investigated with the kind of rigor applied here.

      The authors do acknowledge that chirps could function as both a communication and homeactive sensing signal, but it seems clear they wish to argue against the former and for the latter, and the evidence is not yet there to support this.

      In both rounds of revision we have made an effort to convey a more inclusive interpretation of our findings. We tried our best to express our ideas as hypothetical, not as proof that communication through chirps does not exist. The aim of this study is to propose an alternative view, and this cannot be done without underlining the weak points of an existing hypothesis while providing and supporting reasonable arguments in favor of the alternative we advance. The actual evidence for a role of chirping in communication is much less strong than appears from the pure number of articles that have discussed chirps in this context.

      Regarding the weak evidence against communication, here we can list a few additional important points related to the proposed interpretations of chirp function (more specific than those made earlier):

      (1) A formally sound assessment of signal value/meaning - as typically done in animal communication studies should involve: 

      a) the isolation of a naturally occurring signal and determination of the context in which it is produced 

      b) the artificial replication of the signal

      c) the observation that such mimic is capable of triggering reliable and stereotyped responses in a group of individuals (identified by sex and/or species) under the same conditions (conditioned, unconditioned, state-dependent, etc.). As discussed for instance in Bradbury and Vehrencamp, 2011; Laidre and Johnstone, 2013; Wyatt, 2015; Rutz et al., 2023.

      This approach has so far not been applied to weakly electric fish. The initial purpose of the present study was in fact to conduct this type of validation.

      (2) The hypothesis of chirps used for DF-sign discrimination - for “social purposes” - although plausible in the face of theoretical considerations,  does not seem to be reasonable in practice, when one considers emission rates of 150 chirps per minute. We do find a strong correlation of chirp type with DF, which is often very abrupt and sudden (as if the fish were tracking beat frequency to guess its value) but the consideration made above on chirp rates seems to discourage this interpretation.

      (3) The hypothesis of chirp-patterning (i.e. chirping may have meaning based on the sequence of chirps of different types, a bit like syllables in birdsongs) - assessed by only one study conducted in our group - has not been enough substantiated by replication. We have surveyed all possible combinations of chirps produced by interacting pairs in different behavioral conditions using different value for chirp sequence size: 2, 3,... ,8 chirps (both considering the sender alone as well as sender+receiver together). In all cases we found no evidence for  a context dependent “modulation” of chirp types (i.e. no specific chirp type sequence in specific contexts).

      (4) The hypothesized role of “large chirps” as courtship signals could be easily criticized by noting the symmetrical distribution of these events around  a DF of 0 Hz . Although one could argue about a failure to discriminate DF-sign, to explain this well known pattern. However, we know from Walter Heiligenberg’s work and physiological considerations that such task can be solved easily through t-units and … in principle even just by motion (which would change the EOD phase in frequency dependent ways, thus potentially revealing the DF sign).

      Overall, these considerations made us think that certainly chirping occurs in a social context, but it is the meaning of this behavior that remains elusive.  We noticed that environmental factors are also strongly implied … we then formulate an alternative hypothesis to explain chirping but we do so  without dismissing the communication idea.

      All this seems to us just a careful way to critically discuss our results and those of other studies, without considering the issue resolved.

      In the introduction, the authors state, "Since both chirps and positional parameters (such as size, orientation or motion) can only be detected as perturbations of the beat, and via the same electroreceptors, the inputs relaying both types of information are inevitably interfering." I disagree with this statement, which seems to be a key assumption. Both of these features certainly modulate the activity of electroreceptors, but that does not mean those modulations are ambiguous as to their source. You do not know whether the two types of modulations can be unambiguously decoded from electroreceptor afferent population activity.

      We thank the Reviewer for noting this imprecision. We have addressed the Reviewer’s concern in another reply (see above).

      My biggest issue with this manuscript is that it is much too strong in dismissing evidence that chirping correlates with context. In your behavioral observations, you found sex differences in chirping as well as differences between freely interacting and physically separated fish. Chirps tended to occur in close proximity to another fish. Your model of chirp variability found that environmental experience, social experience, and beat frequency (DF) are the most important factors explaining chirp variability. Are these not all considered behavioral or social context? Beat frequency (DF) in particular is heavily downplayed as being a part of "context" but it is a crucial part of the context, as it provides information about the identity of the fish you're interacting with. The authors show quite convincingly that the types of chirps produced do not vary with these contexts, but chirp rates do.

      We believe the “perceived claim” may be an issue of unclear writing. We have now tried to better clarify that “context” affects chirp rates, but it does not affect chirp types as much (except when beat frequency is high).  

      We have edited two statements possibly susceptible to misinterpretation: 

      (1) In the results: “It also indicates that chirp parameters such as duration and FM do not seem to be associated with any particular context in a meaningful way, other than being affected by beat frequency.”

      (2) In the discussion: the statement

      “Recordings from interacting fish pairs confirmed the absence of any significant correlation between chirp type choice and behavioral context (Figure S2) although the variance of chirp parameters appears to be significantly affected by this factor (Figure 2). This may suggest that the effect of behavioral context is mainly detectable in the number of chirps produced (Figure S1), rather than the type (Figure S2).”

      has been changed to:

      “Recordings from interacting fish pairs confirmed the absence of any significant correlation between chirp type choice and behavioral context, except for those cases characterized by higher beat frequencies  (Figure S2). This suggests that the effect of behavioral context highlighted in our factor analysis (Figure 2) is mainly due to the number of chirps produced (Figure S1), rather than their type (Figure S2).”

      Eventually, in the results we emphasize the relatively higher impact of previously unexplored factors on chirp variance: “The plot of individual chirps (Figure 2C) shows the presence of clustering around different categorical variables and it reveals that experience levels or swimming conditions are important factors affecting chirp distribution (note for instance the large central “breeding” cluster in which fish are divided and the smaller ones in which fish are free). Sender or receiver identity does not individuate any clear clustering relative to either sex (see the overlap of male_s/male_r and female_s/female_r) or social status (dominant/subordinate). Chirps labeled based on tank experience (i.e. resident vs intruder) are instead clearly separated.”.

      Further, in your playback experiments, fish responded differently to small vs. large DFs, males chirped more than females, type 2 chirps became more frequent throughout a playback, and rises tended to occur at the end of a playback. These are all examples of context-dependent behavior.

      We do note that male brown ghosts chirp more than females. But we do also say - and show in figure 8 - that males move more in proximity to and around conspecifics. We do acknowledge that chirp time-course may be different during playbacks in a type-dependent manner. But how this can support the communication hypothesis - or other alternatives - is unclear. This result could equally imply the use of different chirp types for different probing needs. Since we cannot be sure about either, we do not want to put too much emphasis to it. Eventually, the fact that “context” (here meant broadly to define different experimental situations in which social but also physical and environmental parameters are altered) affects chirping is undeniable: cluttered and non-cluttered environments do represent different contexts which differently affect chirping in conspicuous ways.

      In the results, the authors state, "Overall, the majority of chirps were produced by male subjects, in comparable amounts regardless of environmental experience (resident, intruder or equal; Figure S1A,C), social status (dominant or subordinate; Figure S1B) or social experience (novel or experienced; Figure S1D)." This is not what is shown in Figure S1. S1A shows clear differences between resident vs. intruder males, S1B shows clear differences between dominant vs. subordinate males, and S1D shows clear differences between naïve and experienced males. The analysis shown in Figure 2 would seem to support this. Indeed, the authors state, "Overall, this analysis indicated that environmental and social experience, together with beat frequency (DF) are the most important factors explaining chirp variability."

      The Reviewer is right in pointing at this imprecise reference and we are grateful for spotting this incongruence. The writing refers probably to an earlier version of the figure in which data were grouped and analyzed differently. We now edited the text and changed it to: “Overall, the majority of chirps were produced by male subjects, at rates that seemed  affected by environmental experience (resident, intruder or equal; Figure S1A,C), social status (dominant or subordinate; Figure S1B) and social experience (novel or experienced; Figure S1D).”

      The choice of chirp type varied widely between individuals but was relatively consistent within individuals across trials of the same experiment. The authors interpret this to mean that chirping does not vary with internal state, but is it not likely that the internal states of individuals are stable under stable conditions, and that individuals may differ in these internal states across the same conditions? Stable differences in communication signals between individuals are frequently interpreted as reflecting differences between those individuals in certain characteristics, which are being communicated by these signals.

      It seems here we have been unclear in the writing: while it is true that behavioral states are stable and can imply stable chirp patterning (if the two are related), since chirp types vary abruptly and in a reliable DF-dependent manner, different types of chirps are unlikely to be matched to different internal states following the same temporal order in such a reliable way (similarly repeated through consecutive trials).

      This would imply the occurrence of different internal states in rapid sequence, reliably triggered by repeated EOD ramps, regardless of whether the playback is 20 sec long or 180 sec long.

      We have edited this paragraph to better explain this: “The reliability by which the chirping response adapts to both the rate and direction of beat frequency is variable across individuals but rather stable across trials (relative to a given subject), further suggesting that chirp type variations may not reflect changes in internal states or in the animal motivation to specific behavioral displays (which are presumably subject to less abrupt variations and stereotypical patterning based on DF).”

      I am not convinced of the conclusion drawn by the analysis of chirp transitions. The transition matrices show plenty of 1-2 and 2-1 transitions occurring.

      The only groups in which 1-2 and 2-1 transitions are as frequent as 1-1 and 2-2 (being 1 and 2 the numerical IDs of the two interacting fish) are F-F pairs. This is a result of the fact that in females chirp rates are so low that within-fish-correlations end up being as low as between-fish-correlations. We believe the impression of the Reviewer could be due to the fact that these are normalized maps (see legend of Figure 5A-B).

      Further, the cross-correlation analysis only shows that chirp timing between individuals is not phase-locked at these small timescales. It is entirely possible that chirp rates are correlated between interacting individuals, even if their precise timing is not.

      We agree with the Reviewer, this is a possibility. To address this point, we did edit the results section to acknowledge that what we see may be related to the time window chosen (i.e. 4 sec):

      “More importantly, they show that - at least in the social conditions analyzed here and within small-sized time windows - chirp time series produced by different fish during paired interactions are consistently independent of each other.”

      Further, it is not clear to me how "transitions" were defined. The methods do not make this clear, and it is not clear to me how you can have zero chirp transitions between two individuals when those two individuals are both generating chirps throughout an interaction.

      We thank the Reviewer for bringing up this unclear point. We have now clarified how transitions were calculated in the method section: “The number of chirp transitions present in each recording (dataset used for Figures 1, 2, 5) was measured by searching in a string array containing the 4 chirp types per fish pair, all their possible pairwise permutations (i.e. all possible permutations of 4+4=8 elements are: 1-1, 1-2, 1-3 … 7-6, 7-7, 7-8; considering the following legend 1 = fish1 type 1, 2 = fish 1 type 2, 3 = fish1 type 3 … 6 = fish2 type 2, 7 = fish2 type 3 and 8 = fish2 rise).”.

      Zero transitions are possible if two fish (or groups of fish) do not produce chirps of all types. Only transitions of produced types can be counted.

      In the results, "Although all chirp types were used during aggressive interactions, these seemed to be rather less frequent in the immediate surround of the chirps (Figure 6A)." A lack of precise temporal correlation on short timescales does not mean there is no association between the two behaviors. An increased rate of chirping during aggression is still a correlation between the two behaviors, even if chirps and specific aggressive behaviors are not tightly time-locked.

      The Reviewer is right in pointing out the limited temporal scaling of our observations/analysis. We have now edited the last paragraph of the results related to figure 6 to include the possibility mentioned by the Reviewer: “The significantly higher extent of chirping during swimming and locomotion, consistently confirmed by 4 different approaches (PSTH, TM, CN, MDS), suggests that - although chirp-behavior correlations may exist at time-scales larger than those here considered - chirping may be linked more strongly with scanning and environmental exploration than with a particular motivational state, thus confirming findings from our playback experiments.”

      The Reviewer here remarks an important point, yet, due to space limitations, we have considered only a sub-second scale. Most playback experiments in weakly electric fish implied the use of EOD mimics for a few tens of seconds - to avoid habituation in the fish behavioral responses -  while inter-chirp intervals usually range between a few hundreds of milliseconds to seconds (depending on how often a fish would chirp). This suggested to us that a 4 second time window may not be a bad choice to start with.

      In summary, it is simply too strong to say that chirping does not correlate with context, or to claim that there is convincing evidence arguing against a communication function of chirps. Importantly, however, this does not detract from your exciting and well-supported hypothesis that chirping functions in homeoactive sensing. A given EOD behavior could serve both communication and homeoactive sensing. I actually suspect this is quite common in electric fish (both gymnotiforms and mormyrids), and perhaps in other actively sensing species such as echolocating animals. The two are not mutually exclusive.

      We agree with the Reviewer that context - broadly speaking - does affect chirping (as we mentioned above). We hope we have improved the writing and clarified that we do not dismiss communication functions of chirping, but we do lean towards electrolocation based on the considerations above made and our results.

      We do conclude the manuscript remarking that communication and electrolocation are not mutually exclusive: ”probing cues could function simultaneously as proximity signals to signal presence, deter approaches, or coordinate behaviors like spawning, if properly timed (Henninger et al., 2018).” (see the conclusion paragraph of the discussion) .

      Therein, we further add “These findings aim to stir the pot and initiate a discussion on possible alternative functions of chirps beyond their presumed communication role.”.

      With this, we hope we’ve made it clear how we intend our manuscript to be read.

      Reviewer #3 (Public Review):

      Summary:

      This important paper provides the best-to-date characterization of chirping in weakly electric fish using a large number of variables. These include environment (free vs divided fish, with or without clutter), breeding state, gender, intruder vs resident, social status, locomotion state and social and environmental experience, without and with playback experiments. It applies state-of-the-art methods for reducing the dimensionality of the data and finding patterns of correlation between different kinds of variables (factor analysis, K-means). The strength of the evidence, collated from a large number of trials with many controls, leads to the conclusion that the traditionally assumed communication function of chirps may be secondary to its role in environmental assessment and exploration that takes social context into account. Based on their extensive analyses, the authors suggest that chirps are mainly used as probes that help detect beats caused by other fish and as well as objects.

      Strengths:

      The work is based on completely novel recordings using interaction chambers. The amount of new data and associated analyses is simply staggering, and yet, well organized in presentation. The study further evaluates the electric field strength around a fish (via modelling with the boundary element method) and how its decay parallels the chirp rate, thereby relating the above variables to electric field geometry.

      The main conclusions are that the lack of any significant behavioural correlates for chirping, and the lack of temporal patterning in chirp time series, cast doubt on a primary communication goal for most chirps. Rather, the key determinants of chirping are the difference frequency between two interacting conspecifics as well as individual subjects' environmental and social experience. The paper concludes that there is a lack of evidence for stereotyped temporal patterning of chirp time series, as well as of sender-receiver chirp transitions beyond the known increase in chirp frequency during an interaction.

      These conclusions by themselves will be very useful to the field. They will also allow scientists working on other "communication" systems to perhaps reconsider and expand the goals of the probes used in those senses. A lot of data are summarized in this paper, with thorough referencing to past work.

      The alternative hypotheses that arise from the work are that chirps are mainly used as environmental probes for better beat detection and processing and object localization, and in this sense are self-directed signals. This led to their prediction that environmental complexity ("clutter") should increase chirp rate, which is fact was revealed by their new experiments. The authors also argue that waveform EODs have less power across high spatial frequencies compared to pulse-type fish, with a resulting relatively impoverished power of resolution. Chirping in wave-type fish could temporarily compensate for the lower frequency resolution while still being able to resolve EOD perturbations with a good temporal definition (which pulse-type fish lack due to low pulse rates).

      The authors also advance the interesting idea that the sinusoidal frequency modulations caused by chirps are the electric fish's solution to the minute (and undetectable by neural wetware) echo-delays available to it, due to the propagation of electric fields at the speed of light in water. The paper provides a number of experimental avenues to pursue in order to validate the non-communication role of chirps.

      We thank the reviewer for the kind assessment.

      Weaknesses:

      My main criticism is that the alternative putative role for chirps as probe signals that optimize beat detection could be better developed. The paper could be clearer as to what that means precisely, especially since beating - and therefore detection of some aspects of beating due to the proximity of a conspecific - most often precedes chirping. One meaning the authors suggest, tentatively, is that the chirps could enhance electrosensory responses to the beat, for example by causing beat phase shifts that remediate blind spots in the electric field of view.

      We agree with the Reviewer that a better and more detailed explanation of how beat processing for conspecific electrolocation may be positively affected by chirps would be important to provide. We are currently working on a follow-up manuscript in which we intend to include these aspects. For space limitations and readability we had to discard from the current manuscript a lot of results that could further clarify these issues.

      A second criticism is that the study links the beat detection to underwater object localization. The paper does not significantly develop that line of thought given their data - the authors tread carefully here given the speculative aspect of this link. It is certainly possible that the image on the fish's body of an object in the environment will be slightly modified by introducing a chirp on the waveform, as this may enhance certain heterogeneities of the object in relation to its environment. The thrust of this argument derives mainly from the notion of Fourier analysis with pulse type fish EOD waveforms (see above, and radar theory more generally), where higher temporal frequencies in the beat waveform induced by the chirp will enable a better spatial resolution of objects. It remains to be seen whether experiments can show this to be significant.

      Perhaps the Reviewer refers to the last discussion paragraph before the conclusions in which we mention the performance of pulse or wave-type EODs in electrolocation (referring here to ideas illustrated in a recent review by Crampton, 2019). We added to this paragraph a statement which could better clarify that we do not propose that chirping could enhance object electrolocation. What we mean is that, in a context in which object electrolocation occurs through wave-type EODs - given the generally lower performance of such narrow-band signals in resolving the spatial features of any object, even a 3D electric field  - chirping could improve beat detection during social encounters by increasing the amount of information obtained by the fish.

      The edited paragraph now reads: “While broadband pulse signals may be useful to capture highly complex environments rich in foliage, roots and other structures common in vegetation featuring the more superficial habitats in which pulse-type fish live, wave-type EODs may be a better choice in the relatively simpler river-bed environments in which many wave-type fish live (e.g., the benthic zone of deep river channels; Crampton, 2019). In this case, achieving a good spatial resolution is critical during social encounters, especially considering the limited utility of visual cues in these low-light conditions. In such habitats, social encounters may “electrically” be less “abrupt”, but spatially less “conspicuous” or blurred (as a 3D electric field may be). In such a scenario, chirps could serve as a means to supplement the spatial information acquired via the beat, accentuating these cues during periods of reduced resolution.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      None, my points in the original review have been properly addressed in this resubmission.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript presented a useful toolkit designed for CyTOF data analysis, which integrates 5 key steps as an analytical framework. A semi-supervised clustering tool was developed, and its performance was tested in multiple independent datasets. The tool was compared to human experts as well as supervised and unsupervised methods. 

      Strengths: 

      The study employed multiple independent datasets to test the pipeline. A new semi-supervised clustering method was developed. 

      Weaknesses: 

      The examination of the whole pipeline is incomplete. Lack of descriptions or justifications for some analyses. 

      We thank the reviewer’s overall summary and comments of this manuscript. In the last part of the results, we showcased the functionalities of ImmCellTyper in covid dataset, including quality check, BinaryClust clustering, cell abundance quantification, state marker expression comparison within each identified cell types, cell population extraction, subpopulation discovery using unsupervised methods, and data visualization etc. We added more descriptions in the text based on the reviewer’s suggestions. 

      Reviewer #2 (Public Review): 

      Summary: 

      The authors have developed marker selection and k-means (k=2) based binary clustering algorithm for the first-level supervised clustering of the CyTOF dataset. They built a seamless pipeline that offers the multiple functionalities required for CyTOF data analysis. 

      Strengths: 

      The strength of the study is the potential use of the pipeline for the CyTOF community as a wrapper for multiple functions required for the analysis. The concept of the first line of binary clustering with known markers can be practically powerful. 

      Weaknesses: 

      The weakness of the study is that there's little conceptual novelty in the algorithms suggested from the study and the benchmarking is done in limited conditions. 

      We thank the reviewer’s overall summary and comments of this manuscript. While the concept of binary clustering by k-means is not novel, BinaryClust only uses it for individual markers to identify positive and negative cells, then combine it with the pre-defined matrix for cell type identification. This has not been introduced elsewhere. Furthermore, ImmCellTyper streamlines the entire analysis process and enhances data exploration on multiple levels. For instance, users can evaluate functional marker expression level/cellular abundance across both main cell types and subpopulations; Also, this computational framework leverages the advantages of both semi-supervised and unsupervised clustering methods to facilitate subpopulation discovery. We believe these contributions warrant consideration as advancements in the field.  

      As for the benchmarking, we limited the depth only to main cell types rather than subpopulations. The reason is because we only apply BinaryClust to identify main cell types; For the cell subsets discovery, unsupervised methods integrated in this pipeline has already been published and widely used by the research community. Therefore, it does not seem to be necessary for additional benchmarking.

      Reviewer #3 (Public Review): 

      Summary: 

      ImmCellTyper is a new toolkit for Cytometry by time-of-flight data analysis. It includes BinaryClust, a semi-supervised clustering tool (which takes into account prior biological knowledge), designed for automated classification and annotation of specific cell types and subpopulations. ImmCellTyper also integrates a variety of tools to perform data quality analysis, batch effect correction, dimension reduction, unsupervised clustering, and differential analysis. 

      Strengths: 

      The proposed algorithm takes into account the prior knowledge. 

      The results on different benchmarks indicate competitive or better performance (in terms of accuracy and speed) depending on the method. 

      Weaknesses: 

      The proposed algorithm considers only CyTOF markers with binary distribution. 

      We thank the reviewer’s overall summary and comments of this manuscript. Binary classification can be considered as an imitation of human gating strategy, as it is applied to each marker. For example, when characterizing the CD8 T cells, we aim for CD19-CD14-CD3+CD4- population, which is binary in nature (either positive and negative) and follows the same logic as the method (BinaryClust) we developed. Results indicated that it works very well for well-defined main cell lineages, particularly when the expression of the defining marker is not continuous. However, the limitation is for subpopulation identification, because a handful makers behave in a continuum manner, so we suggest unsupervised method after BinaryClust, which also brings another advantage of identifying unknown subsets beyond our current knowledge, and none of the semi-supervised tools can achieve that. To address the reviewer’s concern, we considered the limitation of binary distribution, but it does not profoundly affect the application of the pipeline.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Many thanks for the reviewers’ comments and suggestions, please see below the point-to-point response:

      (1) The style of in-text reference citation is not consistent. Many do not have published years.

      The style of the reference citation has been revised and improved.  

      (2) The font size in the table of Figure 1 is too small, so is Figure 2. 

      The font size has been increased.

      (3) Is flowSOM used as part of BinaryClust? How should the variable running speed of BinaryClust be interpreted, given that it is occasionally slower and sometimes faster than flowSOM in the datasets?

      To answer reviewer’s question, flowSOM is not a part of BinaryClust. They are separate clustering methods that have been incorporated into the ImmCellTyper pipeline. As described in Figure 1, BinaryClust, a semi-supervised method, is used to classify the main cell lineages; while flowSOM, an unsupervised method, is recommended here for further subpopulation discovery. So, they operate independently of each other. To avoid confusions, we slightly modified Figure 1 for clarification.

      Regarding the variability in running speed in Figure 4. The performance of algorithms can indeed be influenced by the characteristics of the datasets, such as size and complexity. The differences observed between the covid dataset and the MPN dataset, such as marker panel, experimental protocol, and data acquisition process etc., could account for this variation. Our explanation is that flowSOM suits better the data structure of covid dataset, which might be the reason why it is slightly faster to analyse compared to the MPN dataset. Moreover, for the covid dataset, the runtime for both BinaryClust and flowSOM is less than 100s, and the difference is not notable. 

      (4) In the Method section ImmCellTyper workflow overview, it is difficult to link the description of the pipeline to Figure 8. There are two sub-pipelines in the text and seven steps in the figure. What are their relations? Some steps are not introduced in the text, such as Data transformation and SCE object construction. What is co-factor 5?

      Figure 8 provides an overview of the entire workflow for CyTOF data analysis, starting from the raw fcs file data and proceeding until downstream analysis (seven steps). But the actual implementation of the pipeline was divided into two separate sections, as outlined in the vignettes of the ImmCellTyper GitHub page (https://github.com/JingAnyaSun/ImmCellTyper/tree/main/vignettes).

      Users will initially run ‘Intro_to_batch_exam_correct’ to perform data quality check and identify potential batch effects, followed by ‘Intro_to_data_analysis’ for data exploration. We agree with the reviewer that the method for this section is a bit confusing, so we’ve added more description for clarification.

      In processing mass cytometry data, arcsine transformation is commonly applied to handle zero values, skewed distributions, and to improve visualization as well as clustering performance. The co-factor here is used as a parameter to scale down the data to control the width of the linear region before arcsine transformation. We usually get the best results by using co-factor 5 for CyTOF data.   

      (5) For differential analysis, could the pipeline analyze paired/repeated samples?

      For the statistical step, ImmCellTyper supports both two-study group comparison using Mann-Whitney Wilcoxon test, and multiple study group comparison (n>2) using Kruskal Wallis test followed by post hoc analysis (pairwise Wilcoxon test or Dunn’s test) with multiple testing correction using Benjamini-Hochberg Procedure.

      Certainly, this pipeline allows flexibilities, users can also extract the raw data of cell frequencies and apply suitable statistical methods for testing.

      (6) In Figure 2A, the range of the two axes is different for Dendritic cells, which could be misleading. Why the agreement is bad for dendritic cells?

      The range for the axes is automatically adapted to the data structure, which explains why they may not necessarily be equal. The co-efficient factor for the correlation of DCs is 0.958, compared to other cell types (> 0.99), it is relatively worse but does not indicate poor agreement.

      Moreover, the abundance of DCs is much less than other cell types, comprising approximately 2-5% of whole cells. As a result, even small differences in abundance may appear to as significant variations. For example, a difference of 1% in DC abundance represents a 2-fold change, which can be perceived as substantial.

      Overall, while the agreement for DCs may appear comparatively lower, it is not necessarily indicative of poor performance, considering both the coefficient factor and the relative abundance of DCs compared to other cell types.

      (7) In the Results section BinaryClust achieves high accuracy, what method was used to get the p-value, such as lines 212, 213, etc.?

      The accuracy of BinaryClust was tested using F-measure and ARI against ground truth (manual gating), the detailed description/calculation can be found in methods. For line 212 and 213, the p-value was calculated using ANOVA for the interaction plot shown in Figure 3. We’ve now added the statistical information into the figure legend.   

      (8) The performance comparison between BinaryClust and LDA is close. The current comparison design looks unfair. Given LDA only trained using half data, LDA may outperform BinaryClust.

      It is true that LDA was trained using half data, which is because this method requires manual gating results as training dataset to build a model, then apply the model to the rest of the files to label cell types. Here we used 50% of the whole dataset as training set. We are of course very happy to implement any additional suggestions for a better partition ratio.

      (9) There are 5 key steps in the proposed workflow. However, not every step was presented in the Results.

      Thanks for the comments. The results primarily focused on demonstrating the precision and performance of BinaryClust in comparison with ground truth and existing tools. Additionally, a case study showcasing the application/functions of the entire pipeline in a dataset was also presented. Due to limitation in space, the implementation details of the pipeline were described in the method section and github documentations, which users/readers can easily access.

      Reviewer #2 (Recommendations For The Authors): 

      The tools suggested by the authors could be potentially useful to the community. However, it's difficult to understand the conceptual novelty of the algorithms suggested here. The concept of binary clustering has been described before (https://doi.org/10.1186/s12859-022-05085-zhttps://doi.org/10.1152/ajplung.00104.2022), and it mainly utilizes k-means clustering set to generate binary clusters based on selected markers. Other algorithms associated with the package are taken from other studies. 

      We acknowledge the reviewer’s comment regarding the novelty of our method. While the concept of binary clustering by k-means has been previously described to transcriptome data, our approach applies it to CyTOF data analysis, which has not been introduced elsewhere. Furthermore, ImmCellTyper streamlines the entire analysis process and enhances data exploration on multiple levels. For instance, users can evaluate functional marker expression level/cellular abundance across both main cell types and subpopulations; Also, as stated in the manuscript, this computational framework leverages the advantages of both semi-supervised and unsupervised clustering methods to facilitate subpopulation discovery. We believe these contributions warrant consideration as advancements in the field.  

      In addition, the benchmarking of clustering performance, especially to reproduce manual gating and comparison to tools such as flowSOM is not comprehensive enough. The result for the benchmarking test could significantly vary depending on how the authors set the ground truth (resolution of cell type annotations). The authors should compare the tool's performance by changing the depth of cell type annotations. Especially, the low abundance cell types such as gdT cells or DCs were not effectively captured by the suggested methods. 

      Thanks for the comment. We appreciate the reviewer’s concern. However, as illustrated in figure 1, our approach uses BinaryClust, a semi-supervised method, to identify main cell types rather than directly targeting subpopulations. The reason is because semi-supervised method relies on users’ prior definition thus is limited to discover novel subsets. In the ImmCellTyper framework, unsupervised method was subsequently applied for subset exploration following the BinaryClust step.

      Regarding benchmarking, we focused on testing the precision of BinaryClust for main cell type characterization, because it is what the method is used for in the pipeline, and we believe this is sufficient. As for the cell subsets discovery, the unsupervised methods we integrated has already been published and widely used by the research community. Therefore, it does not seem to be necessary for additional benchmarking.

      Moreover, as shown in Figure 3 and Table 1, our results indicated that the F-measure for DCs and gdT cells in BinaryClust is 0.80 and 0.92 respectively, which were very close to ground truth and outperformed flowSOM, demonstrating its effectiveness. 

      We hope these clarifications address the reviewer’s concern.

      Minor comments: 

      (1) In Figure 4, it's perplexing to note that BinaryClust shows the slowest runtime for the COVID dataset, compared to the MPN dataset, which features a similar number of cells. What causes this variation? Is it dependent on the number of markers utilized for the clustering? This should be clarified/tested. 

      Thanks for the comment, but we are not sure that we fully understand the question. As shown in figure 4 that BinaryClust has slightly higher runtime in MPN dataset than covid dataset, which is reasonable because and the cell number in MPN dataset is around 1.6 million more than covid dataset.

      (2) Some typos are noted: 

      - DeepCyTOF and LDA use a maker expression matrix extracted → "marker"?* 

      Corrected.

      - Datasets(Chevrier et al.)which → spacing* 

      Corrected.

      - This is due to the method's reliance → spacing*

      Corrected.

      Reviewer #3 (Recommendations For The Authors): 

      Is it possible to accommodate more than two levels within the clustering process, i.e., can the proposed semi-supervised clustering tool be extended to multi-levels instead of binary?

      Thanks for the comments. Binary classification can be considered as an imitation of human gating strategy, as it is applied to each marker. For example, when characterizing the CD8 T cells, we aim for CD19-CD14-CD3+CD4- population, which is binary in nature (either positive and negative) and follows the same logic as the method (BinaryClust) we developed. Results indicated that it works very well for well-defined main cell lineages. However, the limitation is for subpopulation identification, because a handful of makers behave in a continuum manner, so we would suggest unsupervised method after BinaryClust, which also brings another advantage of identifying unknown subsets beyond our current knowledge, and none of the semi-supervised tools can achieve that. To answer the reviewer’s question, it is possible to set the number to 3,4,5 rather than just 2, but considering the design and rationale of the entire framework (as describe in the manuscript and above), it doesn’t seem to be necessary.

      Could you please comment on why on the COVID dataset, BinaryClust was slower as compared to flowSOM?

      Thanks for the question. The performance of algorithms can indeed be affected by the characteristics of the datasets, such as their size and complexity. The covid and MPN datasets differ in various aspects including marker panel, experimental protocol, and data acquisition process, among others, which wound account for the observed variation in speed. So, our explanation is flowSOM suits better for the structure of covid dataset than MPN dataset.  Additionally, for covid dataset, both BinaryClust and flowSOM have runtimes of less than 100s, and the difference between the two isn’t particularly dramatic.

      Minor errors: 

      Line#215 "(ref) " reference is missing

      Added.

      Figure 3, increase the font of the text in order to improve readability. 

      Increased.

      Line#229 didn't --> did not. 

      Corrected

      Line#293 repetition of the reference. 

      The repetition is due to the format of the citation, which has been revised.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):  

      Summary: 

      In this study, Nandy and colleagues examine neural and behavioral correlates of perceptual variability in monkeys performing a visual change detection task. They used a laminar probe to record from area V4 while two macaque monkeys detected a small change in stimulus orientation that occurred at a random time in one of two locations, focusing their analysis on stimulus conditions where the animal was equally likely to detect (hit) or not-detect (miss) a briefly presented orientation change (target). They discovered two behavioral measures that are significantly different between hit and miss trials - pupil size tends to be slightly larger on hits vs. misses, and monkeys are more likely to miss the target on trials in which they made a microsaccade shortly before target onset. They also examined multiple measures of neural activity across the cortical layers and found some measures that are significantly different between hits and misses. 

      Strengths: 

      Overall the study is well executed and the analyses are appropriate (though multiple issues do need to be addressed). 

      We thank the reviewer for their enthusiasm and their constructive comments which we address below.

      Weaknesses: 

      My main concern with this study is that with the exception of the pre-target microsaccades, the physiological and behavioral correlates of perceptual variability (differences between hits and misses) appear to be very weak and disconnected. Some of these measures rely on complex analyses that are not hypothesis-driven and where statistical significance is difficult to assess. The more intuitive analysis of the predictive power of trial outcomes based on the behavioral and neural measures is only discussed at the end of the paper. This analysis shows that some of the significant measures have no predictive power, while others cannot be examined using the predictive power analysis because these measures cannot be estimated in single trials. Given these weak and disconnected effects, my overall sense is that the current results do not significantly advance our understanding of the neural basis of perceptual variability. 

      Reviewer #1 (Recommendations For The Authors): 

      (1) Most of the effects are very small. For example, the difference in pupil size between hits and misses is ~0.08 z-score units. The differences in firing rates between hits and misses are in the order of 1-2% of normalized firing rates. While these effects may be significant, their contribution to perceptual variability could be negligible, as suggested by the analysis of predictive power at the end of the result section. On a related note, it would be useful to mention the analysis of predictive power earlier in the paper. The finding that some of the measures do not have significant predictive power w/r to behavioral outcome raises questions regarding their importance. Finally, it would strengthen the paper if the authors could come up with methods to assess the predictive power of the PPC and interlaminar SSC. Without such analyses, it is difficult to assess the importance of these measures. 

      We expect that relatively small differences in early to intermediate sensory areas could cumulatively result in large differences in higher areas and contribute to the binary distinction between hits and misses. We certainly do not claim that these results completely explain state-dependent differences that determine the outcome of these trials. Instead, we have focused on neural signatures at the level of the V4 columnar microcircuit that might ultimately contribute to the variability in perception.

      We would like to emphasize that, based on the reviewer’s recommendation, we have now analyzed our results separately for each animal (see below). The consistency and significance of our findings across both animals give us confidence that what we have reported here are important neural signatures underlying perceptual variability at threshold.

      We would also like to note that SSC and PPC are now part of the standard toolkit of systems neuroscience and have been employed in numerous studies to our knowledge. While all measures come with their set of caveats and limitations, these two measures provide a frequency-resolved metric of the relationship between two temporal processes (point or continuous), which we believe provide insights into the interlaminar flow of information that we report here.

      Unfortunately, limitations in the GLM method and the reliability of these analyses with limited data make it impossible for these two measures to be included. The GLM requires all variables to be defined for each trial in the input. SSC and PPC can be undefined at low firing rates and require a substantial amount of data to be reliably calculated. While we did consider imputing data or estimating SSC and PPC using multiple trials, we ultimately did not pursue this idea as the purpose of the GLM was to use simultaneous measurements from single trials. 

      (2) What is the actual predictive power of the GLM model (i.e., what is the accuracy of predicting whether a given held-out trial will lead to a hit or a miss)? How much of this predictive power is accounted for by the effect of microsaccades? 

      As the GLM is not a decoder, it does not classify whether a given left out trial will be a hit or a miss. However, the GLM was highly predictive compared to a constant model. This information has been added to Table 3. The deviance of the GLM with and without microsaccades as a variable was not significantly different (p >0.9).  

      (3) The role of stimulus contrast is not explained clearly. Are all the analyses and figures restricted to a single contrast level? Was the contrast the same on both sides? If multiple contrasts are used, could contrast account for some of the observed neural-behavioral covariations? 

      All of the analyses include stimuli of all tested contrast levels. Stimulus contrasts were the same at both locations (attended and unattended). We have added a more detailed description of the contrast in hit and miss trials (Lines 289-296 and reproduced that here: 

      “Non-target stimulus contrasts were slightly different between hits and misses (mean:

      33.1% in hits, 34.0% in misses, permutation test, 𝑝 = 0.02), but the contrast of the target was higher in hits compared to misses (mean: 38.7% in hits, 27.7% in misses, permutation test, 𝑝 = 1.6 𝑒 − 31). Firing rates were normalized by contrast in Figure 3. In all other figures, we considered only non-target stimuli, which had very minor differences in contrast (<1%) across hits and misses. While we cannot completely rule out any other effects of stimulus contrast, the normalization in Figure 3 and minor differences for non-target stimuli should minimize them.”

      (4) Do the animals make false alarms (i.e., report seeing a target in non-target epochs)?

      If not, then it is not clear that the animals are performing near their perceptual threshold. If the false-alarm rate is non-zero, it should be reported and analyzed for neural/behavioral correlates. Does the logistic regression fit allow for a false alarm rate? More generally, it would be useful to see a summary of behavioral performance, such as distribution of thresholds, lower and upper asymptotes, and detection rates on foil trials vs. matched target trials. 

      The logistic regression does allow for a false alarm rate. We have reported additional behavioral parameters in Figure 1-figure supplement 3A-G.  

      (5) As far as I can tell, all the analyses in the paper are done on data combined across the two animals. Given that these effects are weak and that the analyses are complex, it is important to demonstrate for each analysis/figure that the results hold for each animal separately before combining the data across animals. This can be done in supplementary figures. 

      We have updated the paper to include all main results plotted separately for each animal as supplementary figures. 

      - Figure 2-figure supplement 2

      - Figure 3-figure supplement 1

      - Figure 3-figure supplement 2

      - Figure 4-figure supplement 1

      - Figure 5-figure supplement 2

      - Figure 7-figure supplement 1

      All the results except for the canonical correlation analysis were present, consistent, and significant when we analyzed them in each monkey independently.

      (6) The selection of the temporal interval used for the various analyses appears somewhat post hoc and is not explained clearly. Some analyses are restricted to the period immediately before or during target onset (e.g., 400 ms before target onset for analysis of the effect of microsaccade, 60 ms before stimulus onset for the analysis of the effect of neural variability). Other analyses are done on non-target rather than target stimuli. What is the justification for selecting these particular periods for these analyses? The differences in firing rates between hits and misses are restricted to the target epoch and are not present in the non-target epochs. Given these results, it seems important to compare the effects in target and non-target epochs in other analyses as well.

      Restricting the analysis of the Fano Factor to 60 ms before non-target onset seems odd. Given that the duration of the interval between stimulus presentations is random, how could this pre-stimulus effect be time-locked to target onset? 

      We selected a 200ms time window during the pre-stimulus or stimulus-evoked period for almost all our analyses. The results relating to microsaccade occurrence were robust to narrower time windows more consistent with the other pre-stimulus windows we used, but we chose to use the 400ms window to capture a larger fraction of trials with microsaccades. 

      Only the Fano factor time window was selected post-hoc based on the traces in Figure 4A, and the result is robust across animals (new Figure 4-figure supplement 1). The inter-stimulus intervals are random, and we do not believe the neural variability is timelocked to upcoming stimuli, but that lower variability in this pre-stimulus window is characteristic of hits. 

      We believe that the consistency of our results across both animals provides further evidence that our time window selection was appropriate. 

      We are interested in the extent to which these effects would remain consistent when applied only to target stimuli. However, restricting our analyses to only target stimuli substantially reduces the amount of neural data available for analysis. We plan to explore target stimulus representation more thoroughly in future studies.   

      (7) Can the measured neural response be used to discriminate between target and nontarget stimuli? If so, is the discriminability between target and non-target higher in hits vs. misses? 

      Thank you for raising this interesting point. We performed this analysis and find that target stimuli are more discriminable from non-targets in hits compared to misses. This has been added as a new Figure 3A.  

      (8) How many trials were performed per session? Did miss probability tend to increase over time over the session? If so, could this slow change in hit probability account for some of the observed neural and behavioral correlations with perceptual decisions? 

      Monkeys initiated a median of 905 trials (range of 651 to 1086). This has been added to the manuscript (Line 106). Approximately 1/8 of those trials were at perceptual threshold. Hit probability at threshold does not change substantially over the course of the session. We now report this in new Figure 1- figure Supplement 3I (error bars show standard deviation). 

      (9) Did miss probability depend on the time of the change within the trial? If so, do any of the behavioral/neural metrics share a similar within-trial time course? 

      Change times were not significantly different across hit and miss trials (p=0.15, Wilcoxon rank sum test). We now report this in new Figure 1-figure supplement 3H.

      (10) "Deep layer neurons exhibit reduced low-frequency phase-locking in hit trials than in misses (Figure 5B), suggesting an improvement in pooled signal-to-noise among this neural population." - why does this metric suggest improved SNR? Is there any evidence for improved SNR in the data? Why just in deep layers? 

      Thank you for raising this question. We agree this statement is not fully supported by the data and have removed it.  

      (11) I may have missed this but what were the sizes of the Gabor stimuli? 

      This has been added to the methods section (Line 454). The Gaussian halfwidth was 2 degrees.  

      Reviewer #2 (Public Review):  

      In this manuscript, the authors conducted a study in which they measured eye movements, pupil diameter, and neural activity in V4 in monkeys engaged in a visual attention task. The task required the monkeys to report changes in the orientation of Gabors' visual stimuli. The authors manipulated the difficulty of the trials by varying the degree of orientation change and focused their analysis on trials of intermediate difficulty where the monkeys' hit rate was approximately 50%. Their key findings include the following: 1) Hit trials were preceded by larger pupil diameter, reflecting higher arousal, and by more stable eye positions; 2) V4 neurons exhibit larger visual responses in hit trials; 3) Superficial and deep layers exhibited greater coherence in hit trials during both the pre-target stimulus period and the non-target stimulus presentation period. These findings have useful implications for the field, and the experiments and analyses presented in this manuscript validly support the authors' claims. 

      Strengths: 

      The experiments were well-designed and executed with meticulous control. The analyses of both behavioural and electrophysiological data align with the standards in the field. 

      We thank the reviewer for their enthusiasm about our study and their constructive comments which we address below.

      Weaknesses: 

      Many of the findings appear to be incremental compared to previous literature, including the authors' own work. While incremental findings are not necessarily a problem, the manuscript lacks clear statements about the extent to which the dataset, analysis, and findings overlap with the authors' prior research. For example, one of the main findings, which suggests that V4 neurons exhibit larger visual responses in hit trials (as shown in Fig. 3), appears to have been previously reported in their 2017 paper. Additionally, it seems that the entire Fig1-S1 may have been reused from the 2017 paper. These overlaps should have been explicitly acknowledged and correctly referenced. 

      While the raw data used in this paper overlaps entirely with Nandy et al. (2017), all the analyses and findings in this manuscript are new and have not been previously reported. Figure 1-figure supplement 1 is modified and reproduced from that paper only to allow readers to understand the recording methods used to collect the data without needing to go back to the previous paper. We have added an explicit acknowledgment of this to the figure caption.

      Previous studies have demonstrated that attention leads to decorrelation in V4 population activity. The authors should have discussed how and why the high coherence across layers observed in the current study can coexist with this decorrelation. 

      We have updated the discussion section (Lines 347-351) to further elaborate on this interpretation. 

      Furthermore, the manuscript does not explore potentially interesting aspects of the dataset. For instance, the authors could have investigated instances where monkeys made 'false' reports, such as executing saccades towards visual stimuli when no orientation change occurred. It would be valuable to provide the fraction of the monkeys' responses in a session, including false reports and correct rejections in catch trials, to allow for a broader analysis that considers the perceptual component of neural activity over pure sensory responses. 

      We appreciate this feedback. While we agree these are interesting directions, we decided to limit the scope of this study to only focus on trials at threshold with an orientation change, and are considering these directions for future studies. 

      Reviewer #2 (Recommendations For The Authors): 

      • Figure Design: Since eLife does not impose space limitations, it is advisable for the authors to avoid using very small font sizes. Consistency in font size throughout the figures is recommended. Some figures are challenging to discern, for example, the mean+-sem in Fig. 2B, and the alpha values of green and purple colours for superficial/deep layers are too high, making them too transparent or pale. 

      We have increased the size of some small fonts and improved font size consistency throughout the figures. We have changed the layer colors to improve legibility. 

      • Line 119: trail, 

      This has been fixed.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The present study provides a phylogenetic analysis of the size prefrontal areas in primates, aiming to investigate whether relative size of the rostral prefrontal cortex (frontal pole) and dorsolateral prefrontal cortex volume vary according to known ecological or social variables.

      I am very much in favor of the general approach taken in this study. Neuroimaging now allows us to obtain more detailed anatomical data in a much larger range of species than ever before and this study shows the questions that can be asked using these types of data. In general, the study is conducted with care, focusing on anatomical precision in definition of the cortical areas and using appropriate statistical techniques, such as PGLS.

      I have read the revised version of the manuscript with interest. I agree with the authors that a focus on ecological vs laboratory variables is a good one, although it might have been useful to reflect that in the title.

      I am happy to see that the authors included additional analyses using different definitions of FP and DLPFC in the supplementary material. As I said in my earlier review, the precise delineation of the areas will always be an issue of debate in studies like this, so showing the effects of different decisions in vital.

      We thank the reviewer for these positive remarks and for these very useful suggestions on the previous version of this article.

      I am sorry the authors are so dismissive of the idea of looking the models where brain size and area size are directly compared in the model, rather preferring to run separate models on brain size and area size. This seems to me a sensible suggestion.

      We agree with the reviewer 1 and the response of reviewer 3 also made it clear to us of why it was an important issue. We have therefore addressed it more thoroughly this time.

      First, we have added a new analysis, with whole brain volume included as covariate in the model accounting for regional volumes, together with the socio-ecological variables of interest. As expected given the very strong correlation across all brain measures (>90%), the effects of all socio-ecological factors disappear for both FP and DLPFC volumes when ‘whole brain’ is included as covariate. This is coherent with our previous analysis showing that the same combination of socio-ecological variables could account for the volume of FP, DLPFC and the whole brain. Nevertheless, the interpretation of these results remains difficult, because of the hidden assumptions underlying the analysis (see below).

      Second, we have clarified the theoretical reasons that made us choose absolute vs relative measures of brain volumes. In short, we understand the notion of specificity associated with relative measures, but 1) the interpretation of relative measures is confusing and 2) we have alternative ways to evaluate the specificity of the effects (which are complementary to the idea of adding whole brain volume as covariate). 

      Our goal here was to evaluate the influence of socio-ecological factors on specific brain regions, based on their known cognitive functions in laboratory conditions (working memory for the DLPFC and metacognition for the frontal pole). Thus, the null hypothesis is that socio-ecological challenges supposed to mobilize working memory and metacognition do not affect the size of the brain regions associated with these functions (respectively DLPFC and FP). This is what our analysis is testing, and from that perspective, it seems to us that direct measures are better, because within regions (across species), volumes provide a good index of neural counts (since densities are conserved), which are indicative fo the amount of computational resources available for the region. It is not the case when using relative measures, or when using the whole brain as covariate, since densities are heterogenous across brain regions (e.g. Herculano-Houzel, 2011; 2017, but see below for further details on this).

      Quantitatively, the theoretical level of specificity of the relation between brain regions and socio-ecological factors is difficult to evaluate, given that our predictions are based on the cognitive functions associated with DLPFC and FP, namely working memory and metacognition, and that each of these cognitive functions also involved other brain regions. We would actually predict that other brain regions associated with the same cognitive functions as DLPFC or FP also show a positive influence of the same socioecological variables. Given that the functional mapping of cognitive functions in the brain remains debated, it is extremely difficult to evaluate quantitatively how specific the influence of the socio-ecological factors should be on DLPFC and FP compared to the rest of the brain, in the frame of our hypothesis.

      Critically, given that FP and DLPFC show a differential sensitivity to population density, a proxy for social complexity, and that this difference is in line with laboratory studies showing a stronger implication of the FP in social cognition, we believe that there is indeed some specificity in the relation between specific regions of the PFC and socioecological variables. Thus, our results as a whole seem to indicate that the relation between prefrontal cortex regions and socio-ecological variables shows a small but significant level of specificity. We hope that the addition of the new analysis and the corresponding modifications of the introduction and discussion section will clarify this point.

      Similarly, the debate about whether area volume and number of neurons can be equated across the regions is an important one, of which they are a bit dismissive.

      We are sorry that the reviewer found us a bit dismissive on this issue, and there may have been a misunderstanding.

      Based on the literature, it is clearly established that for a given brain region, area volume provides a good proxy for the number of neurons, and it is legitimate to generalize this relation across species if neuronal densities are conserved for the region of interest (see for example Herculano-Houzel 2011, 2017 for review). It seems to be the case across primates because cytoarchitectonic maps are conserved for FP and DLPFC, at least in humans and laboratory primates (Petrides et al, 2012; Sallet et al, 2013; Gabi et al, 2016; Amiez et al, 2019). But we make no claim about the difference in number of neurons between FP and DLPFC, and we never compared regional volumes across regions (we only compared the influence of socio-ecological factors on each regional volume), so their difference in cellular density is not relevant here. As long as the neuronal density is conserved across species but within a region (DLPFC or FP), the difference in volume for that region, across species, does provide a reliable proxy for the influence of the socioecological regressor of interest (across species) on the number of neurons in that region.

      Our claims are based on the strength of the relation between 1) cross-species variability in a set of socio-ecological variables and 2) cross-species variability in neural counts in each region of interest (FP or DLPFC). Since the effects of interest relate to inter-specific differences, within a region, our only assumption is that the neural densities are conserved across distinct species for a given brain region. Again (see previous paragraph), there is reasonable evidence for that in the literature. Given that assumption, regional volumes (across species, for a given brain region) provide a good proxy for the number of neurons. Thus, the influence of a given socio-ecological variable on the interspecific differences in the volume of a single brain region provides a reliable estimate of the influence of that socio-ecological variable on the number of neurons in that region (across species), and potentially of the importance of the cognitive function associated with that region in laboratory conditions. None of our conclusions are based on direct comparison of volumes across regions, and we only compared the influence of socioecological factors (beta weights, after normalization of the variables).

      Note that this is yet another reason for not using relative measures and not including whole brain as covariate in the regression model: Given that whole brain and any specific region have a clear difference in density, and that this difference is probably not conserved across species, relative measures (or covariate analysis) cannot be used as proxies for neuronal counts (e.g. Herculano-Houzel, 2011). In other words, using the whole brain to rescale individual brain regions relies upon the assumption that the ratios of volumes (specific region/whole brain) are equivalent to the ratios of neural counts, which is not valid given the differences in densities.

      Nevertheless, I think this is an important study. I am happy that we are using imaging data to answer more wider phylogenetic questions. Combining detailed anatomy, big data, and phylogenetic statistical frameworks is a important approach.

      We really thank the reviewer for these positive remarks, and we hope that this study will indeed stimulate others using a similar approach.

      Reviewer #2 (Public Review):

      In the manuscript entitled "Linking the evolution of two prefrontal brain regions to social and foraging challenges in primates" the authors measure the volume of the frontal pole (FP, related to metacognition) and the dorsolateral prefrontal cortex (DLPFC, related to working memory) in 16 primate species to evaluate the influence of socio-ecological factors on the size of these cortical regions. The authors select 11 socio-ecological variables and use a phylogenetic generalized least squares (PGLS) approach to evaluate the joint influence of these socio-ecological variables on the neuro-anatomical variability of FP and DLPFC across the 16 selected primate species; in this way, the authors take into account the phylogenetic relations across primate species in their attempt to discover the the influence of socio-ecological variables on FP and DLPF evolution.

      The authors run their studies on brains collected from 1920 to 1970 and preserved in formalin solution. Also, they obtained data from the Mussée National d´Histoire Naturelle in Paris and from the Allen Brain Institute in California. The main findings consist in showing that the volume of the FP, the DLPFC, and the Rest of the Brain (ROB) across the 16 selected primate species is related to three socio-ecological variables: body mass, daily traveled distance, and population density. The authors conclude that metacognition and working memory are critical for foraging in primates and that FP volume is more sensitive to social constraints than DLPFC volume.

      The topic addressed in the present manuscript is relevant for understanding human brain evolution from the point of view of primate research, which, unfortunately, is a shrinking field in neuroscience. But the experimental design has two major weak points: the absence of lissencephalic primates among the selected species and the delimitation of FP and DLPFC. Also, a general theoretical and experimental frame linking evolution (phylogeny) and development (ontogeny) is lacking.

      We are sorry that the reviewer still believes that these two points are major weaknesses.

      - We have added a point on lissencephalic species in the discussion. In short, we acknowledge that our work may not be applied to lissencephalic species because they cannot be studied with our method, but on the other hand, based on laboratory data there is no evidence showing that the functional organization of the DLPFC and FP in lissencephalic primates is radically different from that of other primates (Dias et al, 1996; Roberts et al, 2007; Dureux et al, 2023; Wong et al, 2023). Therefore, there is no a priori reason to believe that not including lissencephalic primates prevents us from drawing conclusions that are valid for primates in general. Moreover, as explained in the discussion, including lissencephalic primates would require using invasive functional studies, only possible in laboratory conditions, which would not be compatible with the number of species (>15) necessary for phylogenetic studies (in particular PGLS approaches). Finally, as pointed out by the reviewer, our study is also relevant for understanding human brain evolution, and as such, including lissencephalic species should not be critical to this understanding.

      - In response to the remarks of reviewer 1 on the first version of the manuscript, we had included a new analysis in the previous version of the manuscript, to evaluate the validity of our functional maps given another set of boundaries between FP and DLPFC. But one should keep in mind that our objective here is not to provide a definitive definition of what the regions usually referred to as DLPFC and FP should be from an anatomical point of view. Rather, as our study aims at taking into account the phylogenetic relations across primate species, we chose landmarks that enable a comparison of the volume of cortex involved in metacognition (FP) and working memory (DLPFC) across species. We have also updated the discussion accordingly.

      We agree that this is a difficult point and we have always acknowledged that this was a clear limitation in our study. In the light of the functional imaging literature in humans and non-human primates, as well as the neurophysiological data in macaques, defining the functional boundary between FP and DLPFC remains a challenging issue even in very well controlled laboratory conditions. As mentioned by reviewer 1, “the precise delineation of the areas will always be an issue of debate in studies like this, so showing the effects of different decisions in vital”. Again, an additional analyses using different boundaries for FP and DLPFC was included in the supplementary material to address that issue. Now, we are not aware of solid evidence showing that the boundaries that we chose for DLPFC vs FP were wrong, and we believe that the comparison between 2 sets of measures as well as the discussion on this topic should be sufficient for the reader to assess both the strength and the limits of our conclusion. That being said, if the reviewer has any reference in mind showing better ways to delineate the functional boundary between FP and DLPFC in primates, we would be happy to include it in our manuscript.

      - The question of development, which is an important question per se,  is neither part of the hypothesis nor central for the field of comparative cognition in primates. Indeed, major studies in the field do not mention development (e.g. Byrne, 2000; Kaas, 2012; Barton, 2012). De Casien et al (2022) even showed that developmental constraints are largely irrelevant (see Claim 4 of their article): [« The functional constraints hypothesis […] predicts more complex, ‘mosaic’ patterns of change at the network level, since brain structure should evolve adaptively and in response to changing environments. It also suggests that ‘concerted’ patterns of brain evolution do not represent conclusive evidence for developmental constraints, since allometric relationships between developmentally linked or unlinked brain areas may result from selection to maintain functional connectivity. This is supported by recent computational modeling work [81], which also suggests that the value of mosaic or concerted patterns may fluctuate through time in a variable environment and that developmental coupling may not be a strong evolutionary constraint. Hence, the concept of concerted evolution can be decoupled from that of developmental constraints »].

      Finally, when studies on brain evolution and cognition mention development, it is generally to discuss energetic constraints rather than developmental mechanisms per se (Heldstab et al 2022 ; Smaers et al, 2021;  Preuss & Wise, 2021; Dunbar & Schutz, 2017; MacLean et al, 2012. Mars et al, 2018; 2021). Therefore, development does not seem to be a critical issue, neither for our article nor for the field.

      Reviewer #3 (Public Review):

      This is an interesting manuscript that addresses a longstanding debate in evolutionary biology - whether social or ecological factors are primarily responsible for the evolution of the large human brain. To address this, the authors examine the relationship between the size of two prefrontal regions involved in metacognition and working memory (DLPFC and FP) and socioecological variables across 16 primate species. I recommend major revisions to this manuscript due to: 1) a lack of clarity surrounding model construction; and 2) an inappropriate treatment of the relative importance of different predictors (due to a lack of scaling/normalization of predictor variables prior to analysis).

      We thank the reviewer for his/her remarks, and for the clarification of his /her criticism regarding the use of relative measures. We are sorry to have missed the importance of this point in the first place. We also thank the reviewer for the cited references, which were very interesting and which we have included in the discussion. As the reviewer 1 also shared these concerns, we wrote a detailed response to explain how we addressed the issue above.

      First, we did run a supplementary analysis where whole brain volume was added as covariate, together with socio-ecological variables, to account for the volume of FP or DLPFC. As expected given the very high correlation across all 3 brain measures, none of the socio-ecological variables remained significant. We have added a long paragraph in the discussion to tackle that issue. In short, we agree with the reviewer that the specificity of the effects (on a given brain region vs the rest of the brain) is a critical issue, and we acknowledge that since this is a standard in the field, it was necessary to address the issue and run this extra-analysis. But we also believe that specificity could be assessed by other means: given the differential influence of ‘population density’ on FP and DLPFC, in line with laboratory data, we believe that some of the effects that we describe do show specificity. Also, we prefer absolute measures to relative measures because they provide a better estimate of the corresponding cognitive operation, because standard allometric rules (i.e., body size or whole brain scaling) may not apply to the scaling and evolution of FP and DLPFC in primates.. Indeed, given that we use these measures as proxies of functions (metacognition for FP and working memory for DLPFC), it is clear that other parts of the brain should show the same effect since these functions are supported by entire networks that include not only our regions of interest but also other cortical areas in the parietal lobe. Thus, the extent to which the relation with socio-ecological variables should be stronger in regions of interest vs the whole brain depends upon the extent to which other regions are involved in the same cognitive function as our regions of interest, and this is clearly beyond the scope of this study. More importantly, volumetric measures are taken as proxies for the number of neurons, but this is only valid when comparing data from the same brain region (across species), but not across brain regions, since neural densities are not conserved. Thus, using relative measures (scaling with the whole brain volume) would only work if densities were conserved across brain regions, but it is not the case. From that perspective, the interpretation of absolute measures seems more straightforward, and we hope that the specificity of the effects could be evaluated using the comparison between the 3 measures (FP, DLPFC and whole brain) as well as the analysis suggested by the reviewer. We hope that the additional analysis and the updated discussion will be sufficient to cover that question, and that the reader will have all the information necessary to evaluate the level of specificity and the extent to which our findings can be interpreted.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In my previous review of the present manuscript, I pointed out the fact that defining parts, modules, or regions of the primate cerebral cortex based on macroscopic landmarks across primate species is problematic because it prevents comparisons between gyrencephalic and lissencephalic primate species. The authors have rephrased several paragraphs in their manuscript to acknowledge that their findings do apply to gyrencephalic primates.

      I also said that "Contemporary developmental biology has showed that the selection of morphological brain features happens within severe developmental constrains. Thus, the authors need a hypothesis linking the evolutionary expansion of FP and DLPFC during development. Otherwise, the claims form the mosaic brain and modularity lack fundamental support". I insisted that the author should clarify their concept of homology of cerebral cortex parts, modules, or regions cross species (in the present manuscript, the frontal pole and the dorsolateral prefrontal cortex). Those are not trivial questions because any phylogenetic explanation of brain region expansion in contemporary phylogenetic and evolutionary biology must be rooted in evolutionary developmental biology. In this regard, the authors could have discussed their findings in the frame of contemporary studies of cerebral cortex evolution and development, but, instead, they have rejected my criticism just saying that they are "not relevant here" or "clearly beyond the scope of this paper".

      The question of development, which is an important question per se, is neither part of the hypothesis nor central for the field of comparative cognition in primates. Indeed, the major studies in the field do not mention development and some even showed that developmental constraints were not relevant (see De Casien et al., 2022 and details in our response to the public review). When studies on brain evolution and cognition mention development, it is generally to discuss energetic constraints rather than developmental mechanisms per se (Heldstab et al 2022 ; Smaers et al, 2021;  Preuss & Wise, 2021; Dunbar & Schutz, 2017;  MacLean et al, 2012. Mars et al, 2018; 2021).

      If the other reviewers agree, the authors are free to publish in eLife their correlations in a vacuum of evolutionary developmental biology interpretation. I just disagree. Explanations of neural circuit evolution in primates and other mammalian species should tend to standards like the review in this link: https://royalsocietypublishing.org/doi/full/10.1098/ rstb.2020.0522

      In this article, Paul Cizek (a brilliant neurophysiologist) speculates on potential evolutionary mechanisms for some primate brain functions, but there is surprisingly very little reference to the existing literature on primate evolution and cognition. There is virtually no mention of studies that involve a large enough number of species to address evolutionary processes and/or a comparison with fossils and/or an evaluation of specific socio-ecological evolutionary constraints. Most of the cited literature refers to laboratory studies on brain anatomy of a handful of species, and their relevance for evolution remains to be evaluated. These ideas are very interesting and they could definitely provide an original perspective on evolution, but they are mostly based on speculations from laboratory studies, rather than from extensive comparative studies. This paper is interesting for understanding developmental mechanisms and their constraints on neurophysiological processes in laboratory conditions, but we do not think that it would fit it in the framework of our paper as it goes far beyond our main topic.

      Reviewer #3 (Recommendations For The Authors):

      Yes, I am suggesting that the authors also include analyses with brain size (rather than body size) as a covariate to evaluate the effects of other variables in the model over and above the effect on brain size. In a very simplified theoretical scenario: two species have the same body sizes, but species A has a larger brain and therefore a larger FP. In this case, species A has a larger FP because of brain allometric patterns, and models including body size as a covariate would link FP size and socioecological variables characteristic of species A (and others like it). However, perhaps the FP of species A is actually smaller than expected for its brain size, while the FP of species B is larger than expected for its brain size.

      As explained in our response to the public review, we did run this analysis and we agree with the reviewer’s point from a practical point of view: it is important to know the extent to which the relation with a set of socio-ecological variables is specific of the region of interest, vs less specific and present for other brain regions. Again, we are sorry to not have understood that earlier, and we acknowledge that since it is a standard in the field, it needs to be addressed thoroughly.

      We understand that the scaling intuition, and the need to get a reference point for volumetric measures, but here the volume of each brain region is taken as a proxy for the number of neurons and therefore for the region’s computational capacities. Since, for a given brain region (FP or DLPFC) the neural densities seem to be well conserved across species, comparing regional volumes across species provides a good proxy for the contrast (across species) in neural counts for that region. All we predicted was that for a given brain region, associated with a given cognitive operation, the volume (number of neurons) would be greater in species for which socio-ecological constraints potentially involving that specific cognitive operation were greater. We do not understand how or why the rest of the brain would change this interpretation (of course, as discussed just above, beyond the question of specificity). And using whole brain volume as a scaling measure is problematic because the whole brain density is very different from the density of these regions of the prefrontal cortex (see above for further details). Again, we acknowledge that allometric patterns exist, and we understand how they can be interpreted, but we do not understand how it could prove or disprove our hypothesis (brain regions involved in specific cognitive operations are influenced by a specific set of socio-ecological variables). When using volumes as a proxy for computational capacities, the theoretical implications of scaling  procedures might be problematic. For example, it implies that the computational capacities of a given brain region are scaled by the rest of the brain. All other things being equal, the computational capacities of a given brain region, taken as the number of neurons, should decrease when the size of the rest of the brain increases. But to our knowledge there is no evidence for that in the literature. Clearly these are very challenging issues, and our position was to take absolute measures because they do not rely upon hidden assumptions regarding allometric relations and their consequence on cognition.

      But since we definitely understand that scaling is a reference in the field, we have not only completed the corresponding analysis (including the whole brain as a covariate, together with socio-ecological variables) but also expended the discussion to address this issue in detail. We hope that between this new analysis and the comparison of effects between non-scaled measures of FP, DLPFC and the whole brain, the reader will be able to judge the specificity of the effect.

      Models including brain (instead of body) size would instead link FP size and socioecological variables characteristic of species B (and others like it). This approach is supported by a large body of literature linking comparative variation in the relative size of specific brain regions (i.e., relative to brain size) to behavioral variation across species - e.g., relative size of visual/olfactory brain areas and diurnality/nocturnality in primates (Barton et al. 1995), relative size of the hippocampus and food caching in birds (Krebs et al. 1989).

      Barton, R., Purvis, A., & Harvey, P. H. (1995). Evolutionary radiation of visual and olfactory brain systems in primates, bats and insectivores. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 348(1326), 381-392.

      Krebs, J. R., Sherry, D. F., Healy, S. D., Perry, V. H., & Vaccarino, A. L. (1989). Hippocampal specialization of food-storing birds. Proceedings of the National Academy of Sciences, 86(4), 1388-1392. 

      We are grateful to the reviewer for mentioning these very interesting articles, and more generally for helping us to understand this issue and clarify the related discussion. Again, we understand the scaling principle but the fact that these methods provide interesting results does not make other approaches (such as ours) wrong or irrelevant. Since we have used both our original approach and the standard version as requested by the reviewer, the reader should be able to get a clear picture of the measures and of their theoretical implications. We sincerely hope that the present version of the paper will be satisfactory, not only because it is clearer, but also because it might stimulate further discussion on this complex question.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presents useful findings on several phage from deep sea isolates of Lentisphaerae strains WC36 and zth2 that further our understanding of deep sea microbial life. The manuscript's primary claim is that phage isolates augment polysaccharide use in Pseudomonas bacteria via auxiliary metabolic genes (AMGs). However, the strength of the evidence is incomplete and does not support the primary claims. Namely, there are not data presented to rule out phage contamination in the polysaccharide stock solution, AMGs are potentially misidentified, and there is missing evidence of successful infection.

      Thanks for the Editor’s and Reviewers’ positive and constructive comments, which help us improve the quality of our manuscript entitled “Deep-sea bacteriophages facilitate host utilization of polysaccharides” (paper#eLife-RP-RA-2023-92345). The comments are valuable, and we have studied the comments carefully and have made corresponding revisions according to the suggestions. We removed some uncertain results and strengthened other parts of the manuscript, which evidently improved the accuracy and impact of the revised version. Revised portions are marked in blue in the modified manuscript. Please find the detailed responses as following.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: This manuscript describes the identification and isolation of several phage from deep sea isolates of Lentisphaerae strains WC36 and zth2. The authors observe induction of several putative chronic phages with the introduction of additional polysaccharides to the media. The authors suggest that two of the recovered phage genomes encode AMGs associated with polysaccharide use. The authors also suggest that adding the purified phage to cultures of Pseudomonas stutzeri 273 increased the growth of this bacterium due to augmented polysaccharide use genes from the phage. While the findings were of interest and relevance to the field, it is my opinion that several of the analysis fall short of supporting the key assertions presented.

      Thanks for your comments. We removed some uncertain results and strengthened other parts of the manuscript, which evidently improved the accuracy and impact of the revised version. Please find the detailed responses as following.

      Strengths: Interesting isolate of deep sea Lentisphaerae strains which will undoubtedly further our understanding of deep sea microbial life.

      Thanks for your positive comments.  

      Weaknesses:

      (1) Many of the findings are consistent with a phage contamination in the polysaccharide stock solution. 

      Thanks for your comments. We are very sure that the phages are specifically derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. The reasons are as following: (1) the polysaccharide stock solution was strictly sterilized to remove any phage contamination; (2) we have performed multiple TEM checks of the rich medium supplemented with 10 g/L laminarin alone (Supplementary Fig. 1A) or in 10 g/L starch alone (Supplementary Fig. 1B), and there were not any phage-like structures, which confirmed that the polysaccharides (laminarin/starch) we used were not contaminated with any phage-like structures; in addition, we also observed the polysaccharides (laminarin/starch) directly by TEM and did not find any phage-like structures (Supplementary Fig. 2); (3) the polysaccharide (starch) alone could not promote the growth of Pseudomonas stutzeri 273, however, the supplement of starch together with the extracted Phages-WC36 could effectively facilitate the growth of Pseudomonas stutzeri 273 (Author response image 1). The above results clearly indicated the phages were derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. 

      Author response image 1.

      Growth curve and status of Pseudomonas stutzeri 273 cultivated in basal medium, basal medium supplemented with 20 μl/mL Phages-WC36, basal medium supplemented with 5 g/L starch, basal medium supplemented with 5 g/L starch and 20 μl/mL Phages-WC36. 

       

      (2) The genes presented as AMGs are largely well known and studied phage genes which play a role in infection cycles.

      Thanks for your comments. Indeed, these AMGs may be only common in virulent phages, while have never been reported in chronic phages. In virulent phages, these genes typically act as lysozymes, facilitating the release of virions from the host cell upon lysis, or injection of viral DNA upon infection. However, the chronic phages do not lyse the host. Therefore, the persistence of these genes in chronic phages may be due to their ability to assist the host in metabolizing polysaccharides. Finally, according to your suggestions, we have weakened the role of AMGs and added “potential” in front of it. The detailed information is shown below.

      (3) The evidence that the isolated phage can infect Pseudomonas stutzeri 273 is lacking, putting into question the dependent results.

      Thanks for your comments. Actually, we selected many marine strains (Pseudomonadota, Planctomycetes, Verrucomicrobia, Fusobacteria, and Tenericutes isolates) to investigate whether Phages-WC36 could assist them in degradation and utilization of polysaccharides, and found that Phages-WC36 could only promote the growth of strain 273. It is reported that filamentous phages could recognize and bind to the host pili, which causes the pili to shrink and brings the filamentous phages closer to and possibly through the outer membrane of host cells. The possible mechanism of other chronic phages release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic manner. Thus, these chronic phages may have a wider host range. However, we were unable to further reveal the infection mechanism due to some techniques absence. Therefore, according to your suggestions, we have deleted this section in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      I have previously reviewed this manuscript as a submission to another journal in 2022. My recommendations here mirror those of my prior suggestions, now with further added details.

      Thanks for your great efforts for reviewing our manuscript and valuable suggestions for last and this versions.

      Specific comments:

      Comment 1: Line 32. Rephrase to "polysaccharides cause the induction of multiple temperate phages infecting two strains of Lentisphaerae (WC36 and zth2) from the deep sea."

      Thanks for your positive suggestion. We have modified this description as “Here, we found for the first time that polysaccharides induced the production of multiple temperate phages infecting two deep-sea Lentisphaerae strains (WC36 and zth2).” in the revised manuscript (Lines 31-33). 

      Comment 2: Line 66. "Chronic" infections are not "lysogenic" as described here, suggesting the former is a subcategory of the latter. If you are going to introduce lifecycles you need a brief sentence distinguishing "chronic" from "lysogenic"

      Thanks for your positive suggestion. We added this sentence as “Currently, more and more attention has been paid to chronic life cycles where bacterial growth continues despite phage reproduction (Hoffmann Berling and Maze, 1964), which was different from the lysogenic life cycle that could possibly lyse the host under some specific conditions.” in the revised manuscript (Lines 66-69).

      Comment 3: Line 72. Please avoid generalized statements like "a hand-full" (or "plenty" line 85). Try to be at least somewhat quantitative regarding how many chronic phages are known. This is a fairly common strategy among archaeal viruses. 

      Thanks for your suggestion. Given that some filamentous phages also have a chronic life cycle that is not explicitly reported, we cannot accurately estimate their numbers. According to your suggestions, we have modified these descriptions as “however, to our best knowledge, only few phages have been described for prokaryotes in the pure isolates up to date (Roux et al., 2019; Alarcón-Schumacher et al., 2022; Liu et al., 2022).” in the revised manuscript (Lines 73-75). In addition, the number of chronic phages in the biosphere cannot be accurately estimated, according to the latest report (Chevallereau et al., 2022), which showed that “a large fraction of phages in the biosphere are produced through chronic life cycles”. Therefore, we have modified this description as “Therefore, a large percentage of phages in nature are proposed to replicate through chronic life cycles” in the revised manuscript (Lines 87-88). 

      Comment 4: Line 93. While Breitbart 2012 is a good paper to cite here, there have been several, much more advanced analysis of the oceans virome. https://doi.org/10.1016/j.cell.2019.03.040 is one example, but there are several others. A deeper literature review is required in this section.  

      Thanks for your valuable suggestions. We have added some literatures and modified this description as “A majority of these viruses are bacteriophages, which exist widely in oceans and affect the life activities of microbes (Breitbart, 2012; Roux et al., 2016; Gregory et al., 2019; Dominguez-Huerta et al., 2022).” in the revised manuscript (Lines 94-97). 

      References related to this response:

      Roux, S., Brum, J.R., Dutilh, B.E., Sunagawa, S., Duhaime, M.B., Loy, A., Poulos, B.T., Solonenko, N., Lara, E., Poulain, J., et al. (2016) Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537:689-693. 

      Gregory, A.C., Zayed, A.A., Conceição-Neto, N., Temperton, B., Bolduc, B., Alberti, A., Ardyna, M., Arkhipova, K., Carmichael, M., Cruaud, C., et al. (2019) Marine DNA Viral Macro- and Microdiversity from Pole to Pole. Cell 177:1109-1123.e1114. 

      Dominguez-Huerta, G., Zayed, A.A., Wainaina, J.M., Guo, J., Tian, F., Pratama, A.A., Bolduc, B., Mohssen, M., Zablocki, O., Pelletier, E., et al. (2022) Diversity and ecological footprint of Global Ocean RNA viruses. Science 376:1202-1208.

      Comment 5: Line 137. I see the phage upregulation in Figure 1, however in the text and figure it would be good to also elaborate on what the background expression generally looks like. Perhaps a transcriptomic read normalization and recruitment to the genome with a display of the coverage map, highlighting the prophage would be helpful. Are the polysacharides directly influencing phage induction or is there some potential for another cascading effect?  

      Thanks for your comments. We have elaborated all expressions of phage-associated genes under different conditions in the Supplementary Table 1, which showed that the background expressions were very low. The numbers in Fig. 1C were the gene expressions (by taking log2 values) of strain WC36 cultured in rich medium supplemented with 10 g/L laminarin compared with the rich medium alone.

      In addition, our RT-qPCR results (Fig. 1D) also confirmed that these genes encoding phage-associated proteins were significantly upregulated when 10 g/L laminarin was added in the rich medium. According to your suggestions, we have modified this description as “In addition to the up-regulation of genes related to glycan transport and degradation, when 10 g/L laminarin was added in the rich medium, the most upregulated genes were phage-associated (e. g. phage integrase, phage portal protein) (Fig. 1C and Supplementary Table 1), which were expressed at the background level in the rich medium alone.” in the revised manuscript (Lines 136-140). Based on the present results, we speculate that polysaccharides might directly induce phage production, which needs to be verified by a large number of experiments in the future.

      Comment 6: Line 179. We need some assurance that phage was not introduced by your laminarin or starch supplement. Perhaps a check on the TEM/sequencing check of supplement itself would be helpful? This may be what is meant on Line 188 "without culturing bacterial cells" however this is not clearly worded if that is the case. Additional note, further reading reinforces this as a key concern. Many of the subsequent results are consistent with a contaminated starch stock. 

      Thanks for your comments. We are very sure that the phages are specifically derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. The reasons are as following: (1) we have performed multiple TEM checks of the rich medium supplemented with 10 g/L laminarin alone (Supplementary Fig. 1A) or in 10 g/L starch alone (Supplementary Fig. 1B), and there were not any phage-like structures, which confirmed that the polysaccharides (laminarin/starch) we used are not contaminated with any phage-like structures. In addition, we also observed the polysaccharides (laminarin/starch) directly by TEM and did not find any phage-like structures (Supplementary Fig. 2). According to your suggestions, we have modified this description as “We also tested and confirmed that there were not any phage-like structures in rich medium supplemented with 10 g/L laminarin alone (Supplementary Fig. 1A) or in 10 g/L starch alone (Supplementary Fig. 1B), ruling out the possibility of phage contamination from the polysaccharides (laminarin/ starch).” in the revised manuscript (Lines 158-162) and “Meanwhile, we also checked the polysaccharides (laminarin/ starch) in rich medium directly by TEM and did not find any phage-like structures (Supplementary Fig. 2).” in the revised manuscript (Lines 178-180). (2) the polysaccharide stock solution was strictly sterilized to remove any phage contamination. (3) the polysaccharide (starch) alone could not promote the growth of Pseudomonas stutzeri 273, however, the supplement of starch together with the extracted Phages-WC36 could effectively facilitate the growth of Pseudomonas stutzeri 273 (Response Figure 1). The above results clearly indicated the phage was derived from the Lentisphaerae strain WC36 but not the polysaccharide stock solution. 

      In addition, given that polysaccharide was a kind of critical energy source for most microorganisms, we sought to ask whether polysaccharide also induces the production of bacteriophages in other deep-sea bacteria. To this end, we cultured deep-sea representatives from other four other phyla (including Chloroflexi, Tenericutes, Proteobacteria, and Actinobacteria) in the medium supplemented with laminarin/starch, and checked the supernatant of cells suspension through TEM as described above. We could not find any phage-like structures in these cells suspension (Author reaponse image 2), which also confirmed that there was no phage contamination in the polysaccharides.

      Author response image 2.

      Growth curve and status of Pseudomonas stutzeri 273 cultivated in basal medium, basal medium supplemented with 20 μl/mL Phages-WC36, basal medium supplemented with 5 g/L starch, basal medium supplemented with 5 g/L starch and 20 μl/mL Phages-WC36.   

      Author response image 3.

      TEM observation of the supernatant of cells suspension of a Chloroflexi strain, a Tenericutes strain, a Proteobacteria strain and an Actinobacteria strain that cultivated in the rich medium supplemented with 10 g/L laminarin and 10 g/L starch. No phage-like particles could be observed.  

      Comment 7: Line 223. Correct generalized wording "long time". 

      Thanks for your comments. We have changed “after for a long time” to “after 30 days” in the revised manuscript (Line 197).

      Comment 8: Line 229. Please more explicitly describe what these numbers are (counts of virion like structures - filamentous and hexagonal respectively?), the units (per µL?), and how these were derived. The word "around" should be replaced with mean and standard deviation values for each count from replicates, without which these are not meaningful.

      Thanks for your comments. The average numbers per microliter (µL) of filamentous and hexagonal phages in each condition were respectively calculated by randomly choosing ten TEM images. According to your suggestions, we have modified this description as “Specifically, the average number per microliter of filamentous phages (9.7, 29 or 65.3) extracted from the supernatant of strain WC36 cultured in rich medium supplemented with 10 g/L laminarin for 5, 10 or 30 days was higher than that cultured in rich medium supplemented with 5 g/L laminarin (4.3, 13.7 or 35.3) (Fig. 3B). The average number per microliter of hexagonal phages (9, 30, 46.7) extracted from the supernatant of strain WC36 cultured in rich medium supplemented with 10 g/L laminarin for 5, 10 or 30 days was higher than that cultured in rich medium supplemented with 5 g/L laminarin (4, 11.3 or 17.7) (Fig. 3C).” in the revised manuscript (Lines 203-210).

      Comment 9: Line 242. This section should be included in the discussion of Figure 2 - around line 194.

      Thanks. According to your suggestion, we have moved this section to the discussion corresponding to Figure 2 (Lines 183-191).

      Comment 10: Figure 3. Stay consistent in the types of figures generated per strain. Figure 3A should be a growth curve.

      Thanks for your comments. Actually, figure 3A was a growth curve, the corresponding description “(A) Growth curve of strain WC36 cultivated in either rich medium alone or rich medium supplemented with 5 g/L or 10 g/L laminarin for 30 days.” was shown in the Figure 3A legend in this manuscript.

      Comment 11: Line 312. Move the discussion of AMGs to after the discussion of the phage genome identification.

      Thanks for your valuable comments. According to your suggestions, we have moved the discussion of AMGs to after the discussion of the phage genome identification.

      Comment 12: Line 312. It would be informative to sequence in-bulk each of your treatments as opposed to just sequencing the viral isolates (starch and no host included) to see what viruses can be identified in each. ABySS is also not a common assembler for viral analysis. Is there literature to support it as a sufficient tool in assembling viral genomes? What sequencing depths were obtained in your samples?

      Thanks for your comments. In previous studies, we did sequence the starch or laminarin alone (no host included) and did not detect any phage-related sequences. The introduction of ABySS software was shown in these literatures (Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res. 2017 May;27(5):768-777; Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Res. 2009 Jun;19(6):1117-23.), which were also used to assemble viral genomes in these literatures (Guo Y, Jiang T. First Report of Sugarcane Mosaic Virus Infecting Goose Grass in Shandong Province, China. Plant Dis. 2024 Mar 21. doi: 10.1094/PDIS-11-23-2514-PDN; Tang M, Chen Z, Grover CE, Wang Y, Li S, Liu G, Ma Z, Wendel JF, Hua J. Rapid evolutionary divergence of Gossypium barbadense and G. hirsutum mitochondrial genomes. BMC Genomics. 2015 Oct 12;16:770.). The sequencing depth of the phages of strain WC36 and zth2 were 350x and 365x, respectively.

      Comment 13: Line 323. Replace "eventually" with more detail about what was done to derive the genomes. Were these the only four sequences identified as viral?

      Thanks for your comments. We have used the ABySS software (http://www.bcgsc.ca/platform/bioinfo/software/abyss) to perform genome assembly with multiple-Kmer parameters. VIBRANT v1.2.1 (Kieft et al., 2020), DRAM-v (Shaffer et al., 2020), VirSorter v1.0.5 (with categories 1 (“pretty sure”) and 2 (“quite sure”)) (Roux et al., 2015) and VirFinder v1.1 (with statistically significant viral prediction: score > 0.9 and P-value < 0.05) (Ren et al., 2017) with default parameters were used to identify viral genomes from these assembly sequences by searching against the both cultured and non-cultured viral NCBI-RefSeq database (http://blast.ncbi.nlm.nih.gov/) and IMG/VR database (Camargo et al., 2023). The GapCloser software (https://sourceforge.net/projects/soapdenovo2/files/GapCloser/) was subsequently applied to fill up the remaining local inner gaps and correct the single base polymorphism for the final assembly results. All the detailed processes were described in the supplementary information. The virus sequences with higher scores are only these four, but they are not complete genomes. Some virus sequences with shorter sequences and lower scores were excluded.

      Comment 14: Line 328. We need some details about the host genomes here. How were these derived? What is their completeness/contamination? What is their size? If the bins are poor, these would not serve as a reliable comparison to identify integrated phage.

      Thanks for your comments. For genomic sequencing, strains WC36 and zth2 were grown in the liquid rich medium supplemented with 5 g/L laminarin and starch and harvested after one week of incubation at 28 °C. Genomic DNA was isolated by using the PowerSoil DNA isolation kit (Mo Bio Laboratories Inc., Carlsbad, CA). Thereafter, the genome sequencing was carried out with both the Illumina NovaSeq PE150 (San Diego, USA) and Nanopore PromethION platform (Oxford, UK) at the Beijing Novogene Bioinformatics Technology Co., Ltd. A complete description of the library construction, sequencing, and assembly was performed as previously described (Zheng et al., 2021). We used seven databases to predict gene functions, including Pfam (Protein Families Database, http://pfam.xfam.org/), GO (Gene Ontology, http://geneontology.org/) (Ashburner et al., 2000), KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/) (Kanehisa et al., 2004), COG (Clusters of Orthologous Groups, http://www.ncbi.nlm.nih.gov/COG/) (Galperin et al., 2015), NR (Non-Redundant Protein Database databases), TCDB (Transporter Classification Database), and Swiss-Prot (http://www.ebi.ac.uk/uniprot/) (Bairoch and Apweiler, 2000). A whole genome Blast search (E-value less than 1e-5, minimal alignment length percentage larger than 40%) was performed against above seven databases.

      The completeness of the genomes of strains WC36 and zth2 were 100%, which were checked by the CheckM v1.2.2. The size of the genome of strains WC36 and zth2 were 3,660,783 bp and 3,198,720bp, respectively. The complete genome sequences of strains WC36 and zth2 presented in this study have been deposited in the GenBank database with accession numbers CP085689 and CP071032, respectively. 

      Moreover, to verify whether the absence of microbial contamination in phage sequencing results, we used the new alignment algorithm BWA-MEM (version 0.7.15) to perform reads mapping of host WGS to these phages. We found that all the raw reads of host strains (WC36 and zth2) were not mapping to these phages sequences (Author response image 3, shown as below). In addition, we also performed the evaluation of the assembly graph underlying the host consensus assemblies. Clean reads were mapped to the bacterial complete genome sequences by the Bowtie 2 (version 2.5.0), BWA (version 0.7.8) and SAMTOOLS (version 0.1.18). The results showed that the total mismatch rate of strains WC36 and zth2 were almost 0% and 0.03%, respectively (Author response table 1, shown as below). In addition, we also collected the cells of strains WC36 and zth2, and then sent them to another company for whole genome sequencing (named WC36G and ZTH, GenBank accession numbers CP151801 and CP119760, respectively). The completeness of the genomes of strains WC36G and ZTH were also 100%. The size of the genome of strains WC36G and ZTH were 3,660,783bp and 3,198,714bp, respectively. The raw reads of strains WC36G and zth2 were also not mapping to the phages sequences. Therefore, we can confirm that these bacteriophage genomes were completely outside of the host chromosomes. 

      Author response image 4.

      The read mapping from WGS to phage sequences.

      Author response table 1.

      Sequencing depth and coverage statistics.

      References related to this response:

      Zheng, R., Liu, R., Shan, Y., Cai, R., Liu, G., and Sun, C. (2021b) Characterization of the first cultured free-living representative of Candidatus Izemoplasma uncovers its unique biology ISME J 15:2676-2691. 

      Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium Nat Genet 25:25-29. 

      Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M. (2004) The KEGG resource for deciphering the genome Nucleic Acids Res 32:D277-280. 

      Galperin, M.Y., Makarova, K.S., Wolf, Y.I., and Koonin, E.V. (2015) Expanded microbial genome coverage and improved protein family annotation in the COG database Nucleic Acids Res 43:D261-269. 

      Bairoch, A., and Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 Nucleic Acids Res 28:45-48.

      Comment 15: Line 333. This also needs some details. What evidence do you have that these are not chromosomal? If not chromosomal where can they be found? Sequencing efforts should also be able to yield extrachromosomal elements such as plasmids etc... If you were to sequence your purified isolate cultures from the rich media alone and include all assemblies (not just those binned for example) as a reference, would you be able to recruit viral reads? The way this reads suggests that Chevallereau et al., worked specifically with these phage, which is not the case - please rephrase.

      Thanks for your comments. We carefully compared the bacteriophage genomes with those of the corresponding hosts (strains WC36 and zth2) using Galaxy Version 2.6.0 (https://galaxy.pasteur.fr/) (Afgan et al., 2018) with the NCBI BLASTN method and used BWA-mem software for read mapping from host whole genome sequencing (WGS) to these bacteriophages. These analyses both showed that the bacteriophage genomes are completely outside of the host chromosomes. Therefore, we hypothesized that the phage genomes might exist in the host in the form similar to that of plasmid.

      Comment 16: Line 335. More to the point here that we need confirmation that these phages were not introduced in the polysaccharide treatment

      Thanks for your comments. Please find our answers for this concern in the responses for comment 1 of “weakness” part and comment 6 of “Recommendations For The Authors” part.

      Comment 17: Line 342. Lacking significant detail here. Phylogeny based on what gene(s), how were the alignments computed/refined, what model used etc..?

      Thanks for your comments. According to your suggestions, all the related information was shown in this section “Materials and methods” of this manuscript. The maximum likelihood phylogenetic tree of Phage-WC36-2 and Phage-zth2-2 was constructed based on the terminase large subunit protein (terL). These proteins used to construct the phylogenetic trees were all obtained from the NCBI databases. All the sequences were aligned by MAFFT version 7 (Katoh et al., 2019) and manually corrected. The phylogenetic trees were constructed using the W-IQ-TREE web server (http://iqtree.cibiv.univie.ac.at) with the “GTR+F+I+G4” model (Trifinopoulos et al., 2016). Finally, we used the online tool Interactive Tree of Life (iTOL v5) (Letunic and Bork, 2021) to edit the tree. 

      Comment 18: Line 346. How are you specifically defining AMGs in this study? Most of these are well-known and studied phage genes with specific life cycle functions and could not be considered as polysaccharide processing AMGs even though in host cells many do play a role in polysaccharide processing systems. A substantially deeper literature review is needed in this section, which would ultimately eliminate most of these from the potential AMG pools. Further, the simple HMM/BLASTp evalues are not sufficient to support the functional annotation of these genes. At a minimum, catalytic/conserved regions should be identified, secondary structures compared, and phylogenetic analysis (where possible) developed etc... My recommendation is to eliminate this section entirely from the manuscript. 

      Categorically:

      - Glycoside hydrolase (various families), glucosaminidases, and transglycosylase are all very common to phage and operate generally as a lysins, facilitating the release of virions from the host cell upon lysis, or injection of viral DNA upon infection https://doi.org/10.3389/fmicb.2016.00745 (and citations therein) https://doi.org/10.1016/j.cmi.2023.10.018 etc... In order to confirm these as distinct AMGs we would need a very detailed analysis indicating that these are not phage infection cycle/host recognition related, however I strongly suspect that under such interrogation, these would prove to be as such.

      -TonB related systems including ExbB are well studied among phages as part of the trans-location step in infection. These could not be considered as AMGs. https://doi.org/10.1128/JB.00428-19. Other TonB dependent receptors play a role in host recognition.

      -Several phage acetyltransferases play a role in suppressing host RNA polymerase in order to reserve host cell resources for virion production, including polysaccharide production. https://doi.org/10.3390/v12090976. Further it has been shown that the E. coli gene neuO (O-acetyltransferase) is a homologue of lambdoid phage tail fiber genes https://doi.org/10.1073/pnas.0407428102. I suspect the latter is also the case here and this is a tail fiber gene.

      Thanks for your valuable comments. According to your suggestions, we have reanalyzed these AMGs and made some modifications (the new version Fig. 5A, shown as below). These genes encoding proteins associated with polysaccharide transport and degradation may be only common in virulent phages, and have never been reported in chronic phages. Unlike virulent phages, these genes typically act as lysozymes, facilitating the release of virions from the host cell upon lysis, or injection of viral DNA upon infection, chronic phages do not lyse the host. It is reported that, filamentous phages could recognize and bind to the host pili, which causes the pili to shrink and brings the filamentous phages closer to and possibly through the outer membrane of host cells (Riechmann et al., 1997; Sun et al., 1987). The possible mechanism of other chronic phage release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic manner. It has recently been reported that the tailless Caudoviricetes phage particles are enclosed in lipid membrane and are released from the host cells by a nonlytic manner (Liu et al., 2022), and the prophage induction contributes to the production of membrane vesicles by Lacticaseibacillus casei BL23 during cell growth (da Silva Barreira et al., 2022). Therefore, the persistence of these genes in chronic phages may be due to their ability to assist the host in metabolizing polysaccharides. 

      Finally, according to your suggestions, we have weakened the role of AMGs and added “potential” in front of it.

      References related to this response:

      Riechmann L, Holliger P. (1997) The C-terminal domain of TolA is the coreceptor for filamentous phage infection of E. coli Cell 90:351-60.

      Sun TP, Webster RE. (1987) Nucleotide sequence of a gene cluster involved in entry of E colicins and single-stranded DNA of infecting filamentous bacteriophages into Escherichia coli J Bacteriol 169:2667-74. 

      Liu Y, Alexeeva S, Bachmann H, Guerra Martníez J.A, Yeremenko N, Abee T et al. (2022) Chronic release of tailless phage particles from Lactococcus lactis Appl Environ Microbiol 88: e0148321. da Silva Barreira, D., Lapaquette, P., Novion Ducassou, J., Couté, Y., Guzzo, J., and Rieu, A. Spontaneous prophage induction contributes to the production of membrane vesicles by the gram-positive bacterium Lacticaseibacillus casei BL23. mBio_._ 2022;13:e0237522.

      Comment 19: Line 354. To make this statement that these genes are missing from the host, we would need to know that these genomes are complete.

      Thanks for your comments. The completeness of the genomes of strains WC36 and zth2 were 100%, which were checked by the CheckM v1.2.2. The size of the genome of strains WC36 and zth2 were 3,660,783 bp and 3,198,720bp, respectively. The complete genome sequences of strains WC36 and zth2 presented in this study have been deposited in the GenBank database with accession numbers CP085689 and CP071032, respectively. In addition, we also collected the cells of strains WC36 and zth2, and then sent it to another company for whole genome sequencing (named WC36G and ZTH, GenBank accession numbers CP151801 and CP119760, respectively). The completeness of the genomes of strains WC36G and ZTH were also 100%. The size of the genome of strains WC36G and ZTH were 3,660,783bp and 3,198,714bp, respectively. Therefore, these genomes of strains WC36 and zth2 were complete and circular.    

      Comment 20: Figure 5. Please see https://peerj.com/articles/11447/ and https://doi.org/10.1093/nar/gkaa621 for a detailed discussion on vetting AMGs. Several of these should be eliminated according to the standards set in the field. More specifically, and by anecdotal comparison with other inoviridae genomes, for Phage-WC36-1 and Phage-zth2-1, I am not convinced that the transactional regulator and glycoside hydrolase are a part of the phage genome. The phage genome probably ends at the strand switch.

      Thanks for your comments. According to your suggestions, we have analyzed these two articles carefully and modified the genome of Phage-WC36-1 and Phage-zth2-1 by anecdotal comparison with other inoviridae genomes. As you said, the transactional regulator and glycoside hydrolase are not a part of the phage genome.

      The new version Fig. 5A was shown.

      References related to this response:

      Shaffer, M., Borton, M.A., McGivern, B.B., Zayed, A.A., La Rosa, S.L., Solden, L.M., Liu, P., Narrowe, A.B., Rodrgíuez-Ramos, J., Bolduc, B., et al. (2020) DRAM for distilling microbial metabolism to automate the curation of microbiome function Nucleic Acids Res 48:8883-8900 

      Pratama, A.A., Bolduc, B., Zayed, A.A., Zhong, Z.P., Guo, J., Vik, D.R., Gazitúa, M.C., Wainaina, J.M., Roux, S., and Sullivan, M.B. (2021) Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation PeerJ 9:e11447

      Comment 21: Line 380. This section needs to start with detailed evidence that this phage can even infect this particular strain. Added note, upon further reading the serial dilution cultures are not sufficient to prove these phage infect this Pseudomonas. We need at a minimum a one-step growth curve and wet mount microscopy. It is much more likely that some carry over contaminant is invading the culture and influencing OD600. With the given evidence, I am not at all convinced that these phages have anything to do with Pseudomonas polysaccharide use and I recommend either drastically revising this section or eliminating it entirely.

      Line 386-389. Could this be because you are observing your added phage in the starch enriched media while no phage were introduced with the "other types of media" so none would be observed? This could have nothing to do with infection dynamics. Further, this would also be consistent with your starch solution being contaminated by phage.

      Line 399. Again consistent with the starch media being contaminated.

      Line 401-408. This is more likely to do with the augmentation of the media with an additional carbon source and not involving the phage. 

      Line 410. I am not convinced that these viruses infect the Pseudomonas strain. Extensive further evidence of infection is needed to make these assertions.  Figure 6A. We need confirmation that the isolate culture remains pure and there are no other contaminants introduced with the phage.

      Thanks for your comments. We have proved that the polysaccharides (laminarin/ starch) didn't contaminate any phages above. Actually, we selected many marine strains (Pseudomonadota, Planctomycetes, Verrucomicrobia, Fusobacteria, and Tenericutes isolates) to investigate whether Phages-WC36 could assist them in degradation and utilization of polysaccharides, and found that Phages-WC36 could only promote the growth of strain 273. The presence of filamentous phages and hexagonal phages was detected in the supernatant of strain 273 cultured in basal medium supplemented with 5 g/L starch and 20 μl/mL Phages-WC36. After 3 passages of serial cultivation in basal medium supplemented with 5 g/L starch, we found that filamentous phages and hexagonal phages were also present in basal medium supplemented with starch, but not in the basal medium, which may mean that Phages-WC36 could infect strain 273 and starch is an important inducer. In addition, the Phages-WC36 used in the growth assay of strain 273 were multiple purified and eventually suspended in SM buffer (0.01% gelatin, 50 mM Tris-HCl, 100 mM NaCl and 10 mM MgSO4). Thus, these phages are provided do not contain some extracellular enzymes and/or nutrients. In addition, we set up three control groups in the growth assay of strain 273: basal medium, basal medium supplemented with Phages-WC36 and basal medium supplemented with starch. If the Phages-WC36 contains some extracellular enzymes and/or nutrients, strain 273 could also grow well in the basal medium supplemented only with Phages-WC36. However, the poor growth results of strain 273 cultivated in the basal medium supplemented with Phages-WC36 further confirmed that there were not some extracellular enzymes and/or nutrients in these phages.

      Finally, the possible mechanism of the chronic phage release without breaking the host might be that it was enclosed in lipid membrane and released from the host cells by a nonlytic manner. Thus, these chronic phages may have a wider host range. However, we were unable to further disclose the infection mechanism in this paper. Therefore, according to your suggestions, we have deleted this section entirely.

      Comment 27: Line 460. Details about how these genomes were reconstructed is needed here.  

      Thanks for your comments. According to your suggestions, we have added the detailed information about the genome sequencing, annotation, and analysis as “Genome sequencing, annotation, and analysis of strains WC36 and zth2 For genomic sequencing, strains WC36 and zth2 were grown in the liquid rich medium supplemented with 5 g/L laminarin and starch and harvested after one week of incubation at 28 °C. Genomic DNA was isolated by using the PowerSoil DNA isolation kit (Mo Bio Laboratories Inc., Carlsbad, CA). Thereafter, the genome sequencing was carried out with both the Illumina NovaSeq PE150 (San Diego, USA) and Nanopore PromethION platform (Oxford, UK) at the Beijing Novogene Bioinformatics Technology Co., Ltd. A complete description of the library construction, sequencing, and assembly was performed as previously described (Zheng et al., 2021b). We used seven databases to predict gene functions, including Pfam (Protein Families Database, http://pfam.xfam.org/), GO (Gene Ontology, http://geneontology.org/) (Ashburner et al., 2000), KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/) (Kanehisa et al., 2004), COG (Clusters of Orthologous Groups, http://www.ncbi.nlm.nih.gov/COG/) (Galperin et al., 2015), NR (Non-Redundant Protein Database databases), TCDB (Transporter Classification Database), and Swiss-Prot (http://www.ebi.ac.uk/uniprot/) (Bairoch and Apweiler, 2000). A whole genome Blast search (E-value less than 1e-5, minimal alignment length percentage larger than 40%) was performed against above seven databases.” in the revised manuscript (Lines 333-351).

      Comment 28: Line 462. Accession list of other taxa in the supplement would help here.  

      Thanks for your comments. The accession numbers of these strains were displayed behind these strains in Figure 1A. According to your suggestions, we have added an accession list of these taxa (Supplementary Table 6) in the revised manuscript.

      Comment 29: Line 463. Is there any literature to support that these are phylogenetically informative genes for Inoviridae?  

      Thanks for your comments. There are some literatures (Zeng et al, 2021; Evseev et al, 2023) to support that these are phylogenetically informative genes for Inoviridae. We have added these literatures in the revised manuscript. 

      References related to this response:

      Zeng, J., Wang, Y., Zhang, J., Yang, S., and Zhang, W. (2021) Multiple novel filamentous phages detected in the cloacal swab samples of birds using viral metagenomics approach Virol J 18:240

      Evseev, P., Bocharova, J., Shagin, D., and Chebotar, I. (2023) Analysis of Pseudomonas aeruginosa isolates from patients with cystic fibrosis revealed novel groups of filamentous bacteriophages. Viruses 15: 2215

      Reviewer #2 (Public Review):

      Summary: This paper investigates virus-host interactions in deep-sea bacteriophage systems which employ a seemingly mutualistic approach to viral replication in which the virus aids host cell polysaccharide import and utilization via metabolic reprogramming. The hypothesis being tested is supported with solid and convincing evidence and the findings are potentially generalizable with implications for our understanding of polysaccharide-mediated virus-host interactions and carbon cycles in marine ecosystems more broadly.

      Thanks for your positive comments.

      Strengths: This paper synthesizes sequencing and phylogenic analyses of two Lentisphaerae bacteria and three phage genomes; electron microscopy imaging of bacterial/phage particles; differential gene expression analyses; differential growth curve analyses, and differential phage proliferation assays to extract insights into whether laminarin and starch can induce both host growth and phage proliferation. The data presented convincingly demonstrate that both host culture density and phage proliferation increase as a result having host, phage, and polysaccharide carbon source together in culture.

      Thanks for your positive comments.  

      Weaknesses (suggestions for improvement): 

      (1) The article would be strengthened by the following additional experiment: providing the phage proteins hypothesized to be aiding host cell growth (red genes from Figure 5...TonB system energizer ExbB, glycosidases, etc) individually or in combination on plasmids rather than within the context of the actual phage itself to see if such additional genes are necessary and sufficient to realize the boosts in host cell growth/saturation levels observed in the presence of the phages tested.

      Thanks for your valuable comments. It is a really good idea to express individually or in combination on plasmids to see the effects of those polysaccharide-degradation proteins in the host cell. However, at present, we failed to construct the genetic and expression system for the strictly anaerobic strain WC36, which hindering our further detailed investigation of the functions of those polysaccharide-degradation proteins. In our lab, we are trying our best to build the genetic and expression system for strain WC36. We will definitely test your idea in the future. 

      (2) The paper would also benefit from additional experiments focused on determining how the polysaccharide processing, transport, and metabolism genes are being used by the phages to either directly increase viral infection/replication or else to indirectly do so by supporting the growth of the host in a more mutualistic manner (i.e. by improving their ability to import, degrade, and metabolize polysaccharides).  

      Thanks for your valuable comments. Indeed, due to the chronic phage genome is not within the chromosome of the host, it is very hard to disclose the exact auxiliary process and mechanism of chronic phages. At present, we are trying to construct a genetic manipulation system for the strictly anaerobic host WC36, and we will gradually reveal this auxiliary mechanism in the future. In addition, combined with the reviewer 1’s suggestions, the focus of revised manuscript is to emphasize that polysaccharides induce deep-sea bacteria to release chronic phages, and most of the content of phage assisting host metabolism of polysaccharides has been deleted.

      (3) The introduction would benefit from a discussion of what is known regarding phage and/or viral entry pathways that utilize carbohydrate anchors during host entry. The discussion could also be improved by linking the work presented to the concept of "selfishness" in bacterial systems (see for instance Giljan, G., Brown, S., Lloyd, C.C. et al. Selfish bacteria are active throughout the water column of the ocean. ISME COMMUN. 3, 11 (2023) https://doi.org/10.1038/s43705-023-00219-7). The bacteria under study are gram negative and it was recently demonstrated (https://www.nature.com/articles/ismej201726) that "selfish" bacteria sequester metabolizable polysaccharides in their periplasm to advantage. It is plausible that the phages may be hijacking this "selfishness" mechanism to improve infectivity and ENTRY rather than helping their hosts to grow and profilerate so they can reap the benefits of simply having more hosts to infect. The current work does not clearly distinguish between these two distinct mechanistic possibilities. The paper would be strengthened by at least a more detailed discussion of this possibility as well as the author's rationale for interpreting their data as they do to favor the "mutualistic" interpretation. In the same light, the paper would benefit from a more careful choice of words which can also help to make such a distinction more clear/evident/intentional. As currently written the authors seem to be actively avoiding giving insights wrt this question.  

      Thanks for your valuable comments. According to your suggestions, we have added the related discussion as “Moreover, it was recently demonstrated that selfish bacteria, which were common throughout the water column of the ocean, could bind, partially hydrolyze, and transport polysaccharides into the periplasmic space without loss of hydrolysis products (Reintjes et al., 2017; Giljan et al., 2023). Based on our results, we hypothesized that these chronic phages might also enter the host through this “selfishness” mechanism while assisting the host in metabolizing polysaccharides, thus not lysing the host. On the other hand, these chronic phages might hijack this “selfishness” mechanism to improve their infectivity and entry, rather than helping their hosts to grow and proliferate, so they could reap the benefits of simply having more hosts to infect. In the future, we need to construct a genetic operating system of the strictly anaerobic host strain WC36 to detailedly reveal the relationship between chronic phage and host.” in the revised manuscript (Lines 305-316). 

      References related to this response:

      Reintjes, G., Arnosti, C., Fuchs, B.M., and Amann, R. (2017) An alternative polysaccharide uptake mechanism of marine bacteria ISME J 11:1640-1650

      Giljan, G., Brown, S., Lloyd, C.C., Ghobrial, S., Amann, R., and Arnosti, C. (2023) Selfish bacteria are active throughout the water column of the ocean ISME Commun 3:11

      (4) Finally, I would be interested to know if the author’s sequencing datasets might be used to inform the question raised above by using bacterial immunity systems such as CRISPR/Cas9. For example, if the phage systems studied are truly beneficial/mutualistic for the bacteria then it’s less likely that there would be evidence of targeted immunity against that particular phage that has the beneficial genes that support polysaccharide metabolism.

      Thanks for your comments. According to your suggestions, we have carefully analyzed the genome of strain WC36, and found that there were no CRISPR/Cas9-related genes. Considering our results that the number of chronic phages was increased with the prolongation of culture time, we speculated that host might have no targeted immunity against these chronic phages.

      Reviewer #2 (Recommendations For The Authors):

      There are some minor grammatical errors and unclear statements (lines 99-100, 107-109, 163, 222, 223, 249-250, 254) which should also be fixed before final publication. 

      Thanks for your valuable comments. We have fixed these minor grammatical errors and unclear statements in the revised manuscript.

      Lines 99-100: we have modified this description as “For instance, AMGs of marine bacteriophages have been predicted to be involved in photosynthesis (Mann et al., 2003), nitrogen cycling (Ahlgren et al., 2019; Gazitúa et al., 2021), sulfur cycling (Anantharaman et al., 2014; Roux et al., 2016), phosphorus cycling (Zeng and Chisholm, 2012), nucleotide metabolism (Sullivan et al., 2005; Dwivedi et al., 2013; Enav et al., 2014), and almost all central carbon metabolisms in host cells (Hurwitz et al., 2013).” in the revised manuscript (Lines 100-105).

      Lines 107-109: we have modified this description as “However, due to the vast majority of deep-sea microbes cannot be cultivated in the laboratory, most bacteriophages could not be isolated.” in the revised manuscript (Lines 110-111).

      Line 163: we have modified this description as “Based on the growth curve of strain WC36, we found that the growth rate of strictly anaerobic strain WC36 was relatively slow.” in the revised manuscript (Lines 149-151).

      Lines 222-223: we have modified this description as “Regardless of whether the laminarin was present, the bacterial cells kept their cell shape intact, indicating they were still healthy after 30 days” in the revised manuscript (Lines 195-197).

      Lines 249-250: we have modified this description as “However, the entry and exit of the hexagonal phages into the WC36 cells were not observed.” in the revised manuscript (Lines 190-191).

      Line 254: we have modified this description as “To explore whether the production of bacteriophages induced by polysaccharide is an individual case, we further checked the effect of polysaccharides on another cultured deep-sea Lentisphaerae strain zth2.” in the revised manuscript (Lines 213-215).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      Thank you for your review and pointing out multiple things to be discussed and clarified! Below, we go through the various limitations you pointed out and refer to the places where we have tried to address them.

      (1) It's important to keep in mind that this work involves simplified models of the motor system, and often the terminology for 'motor cortex' and 'models of motor cortex' are used interchangeably, which may mislead some readers. Similarly, the introduction fails in many cases to state what model system is being discussed (e.g. line 14, line 29, line 31), even though these span humans, monkeys, mice, and simulations, which all differ in crucial ways that cannot always be lumped together.

      That is a good point. We have clarified this in the text (Introduction and Discussion), to highlight the fact that our model isn’t necessarily meant to just capture M1. We have also updated the introduction to make it more clear which species the experiments which motivate our investigation were performed in.

      (2) At multiple points in the manuscript thalamic inputs during movement (in mice) is used as a motivation for examining the role of preparation. However, there are other more salient motivations, such as delayed sensory feedback from the limb and vision arriving in the motor cortex, as well as ongoing control signals from other areas such as the premotor cortex.

      Yes – the motivation for thalamic inputs came from the fact that those have specifically been shown to be necessary for accurate movement generation in mice. However, it is true that the inputs in our model are meant to capture any signals external to the dynamical system modeled, and as such are likely to represent a mixture of sensory signals, and feedback from other areas. We have clarified this in the Discussion, and have added this additional motivation in the Introduction.

      (3) Describing the main task in this work as a delayed reaching task is not justified without caveats (by the authors' own admission: line 687), since each network is optimized with a fixed delay period length. Although this is mentioned to the reader, it's not clear enough that the dynamics observed during the delay period will not resemble those in the motor cortex for typical delayed reaching tasks.

      Yes, we completely agree that the terminology might be confusing. While the task we are modeling is a delayed reaching task, it does differ from the usual setting since the network has knowledge of the delay period, and that is indeed a caveat of the model. We have added a brief paragraph just after the description of the optimal control objective to highlight this limitation.

      We have also performed additional simulations using two different variants of a model-predictive control approach that allow us to relax the assumption that the go-cue time is known in advance. We show that these modifications of the optimal controller yield results that remain consistent with our main conclusions, and can in fact in some settings lead to preparatory activity plateaus during the preparation epoch as often found in monkey M1 (e.g in Elsayed et al. 2016). We have modified the Discussion to explain these results and their limitations, which are summarized in a new Supplementary Figure (S9).

      (4) A number of simplifications in the model may have crucial consequences for interpretation.

      a) Even following the toy examples in Figure 4, all the models in Figure 5 are linear, which may limit the generalisability of the findings.

      While we agree that linear models may be too simplistic, much prior analyses of M1 data suggest that it is often good enough to capture key aspects of M1 dynamics; for example, the generative model underlying jPCA is linear, and Sussillo et al. (2015) showed that the internal activity of nonlinear RNN models trained to reproduce EMG data aligned best with M1 activity when heavily regularized; in this regime, the RNN dynamics were close to linear. Nevertheless, this linearity assumption is indeed convenient from a modeling viewpoint: the optimal control problem is more easily solved for linear network dynamics and the optimal trajectories are more consistent across networks. Indeed, we had originally attempted to perform the analyses of Figure 5 in the nonlinear setting, but found that while the results were overall similar to what we report in the linear regime, iLQR was occasionally trapped into local minimal, resulting in more variable results especially for inhibition-stabilized network in the strongly connected end of the spectrum. Finally, Figure 5 is primarily meant to explore to what extent motor preparation can be predicted from basic linear control-theoretic properties of the Jacobian of the dynamics; in this regard, it made sense to work with linear RNNs (for which the Jacobian is constant).

      b) Crucially, there is no delayed sensory feedback in the model from the plant. Although this simplification is in some ways a strength, this decision allows networks to avoid having to deal with delayed feedback, which is a known component of closed-loop motor control and of motor cortex inputs and will have a large impact on the control policy.

      This comment resonates well with Reviewer 3's remark regarding the autonomous nature (or not) of M1 during movement. Rather than thinking of our RNN models as anatomically confined models of M1 alone, we think of them as models of the dynamics which M1 implements possibly as part of a broader network involving “inter-area loops and (at some latency) sensory feedback”, and whose state appears to be near-fully decodable from M1 activity alone. We have added a paragraph of Discussion on this important point.

      (5) A key feature determining the usefulness of preparation is the direction of the readout dimension. However, all readouts had a similar structure (random Gaussian initialization). Therefore, it would be useful to have more discussion regarding how the structure of the output connectivity would affect preparation, since the motor cortex certainly does not follow this output scheme.

      We agree with this limitation of our model — indeed one key message of Figure 4 is that the degree of reliance on preparatory inputs depends strongly on how the dynamics align with the readout. However, this strong dependence is somewhat specific to low-dimensional models; in higher-dimensional models (most of our paper), one expects that any random readout matrix C will pick out activity dimensions in the RNN that are sufficiently aligned with the most controllable directions of the dynamics to encourage preparation.

      We did consider optimizing C away (which required differentiating through the iLQR optimizer, which is possible but very costly), but the question inevitably arises what exactly should C be optimized for, and under what constraints (e.g fixed norm or not). One possibility is to optimize C with respect to the same control objective that the control inputs are optimized for, and constrain its norm (otherwise, inputs to the M1 model, and its internal activity, could become arbitrarily small as C can grow to compensate). We performed this experiment (new Supplementary Figure S7) and obtained a similar preparation index; there was one notable difference, namely that the optimized readout modes led to greater observability compared to a random readout; thus, the same amount of “muscle energy” required for a given movement could now be produced by a smaller initial condition. In turn, this led to smaller control inputs, consistent with a lower control cost overall.

      Whilst we could have systematically optimized C away, we reasoned that (i) it is computationally expensive, and (ii) the way M1 affects downstream effectors is presumably “optimized” for much richer motor tasks than simple 2D reaching, such that optimizing C for a fixed set of simple reaches could lead to misleading conclusions. We therefore decided to stick with random readouts.

      Additional comments:

      (1) The choice of cost function seems very important. Is it? For example, penalising the square of u(t) may produce very different results than penalising the absolute value.

      Yes, the choice of cost function does affect the results, at least qualitatively. The absolute value of the inputs is a challenging cost to use, as iLQR relies on a local quadratic approximation of the cost function. However, we have included additional experiments in which we penalized the squared derivative of the inputs (Supplementary Figure S8; see also our response to Reviewer 3's suggestion on this topic), and we do see differences in the qualitative behavior of the model (though the main takeaway, i.e. the reliance on preparation, continues to hold). This is now referred to and discussed in the Discussion section.

      (2) In future work it would be useful to consider the role of spinal networks, which are known to contribute to preparation in some cases (e.g. Prut and Fetz, 1999).

      (3) The control signal magnitude is penalised, but not the output torque magnitude, which highlights the fact that control in the model is quite different from muscle control, where co-contraction would be a possibility and therefore a penalty of muscle activation would be necessary. Future work should consider the role of these differences in control policy.

      Thank you for pointing us to this reference! Regarding both of these concerns, we agree that the model could be greatly improved and made more realistic in future work (another avenue for this would be to consider a more realistic biophysical model, e.g. using the MotorNet library). We hope that the current Discussion, which highlights the various limitations of our modeling choices, makes it clear that a lot of these choices could easily be modified depending on the specific assumptions/investigation being performed.

      Reviewer 2:

      Thank you for your positive review! We very much agree with the limitations you pointed out, some of which overlapped with the comments of the other reviewers. We have done our best to address them through additional discussion and new supplementary figures. We briefly highlight below where those changes can be found.

      (1) Though the optimal control theory framework is ideal to determine inputs that minimize output error while regularizing the input norm, it however cannot easily account for some other varied types of objectives especially those that may lead to a complex optimization landscape. For instance, the reusability of parts of the circuit, sparse use of additional neurons when learning many movements, and ease of planning (especially under uncertainty about when to start the movement), may be alternative or additional reasons that could help explain the preparatory activity observed in the brain. It is interesting to note that inputs that optimize the objective chosen by the authors arguably lead to a trade-off in terms of other desirable objectives. Specifically, the inputs the authors derive are time-dependent, so a recurrent network would be needed to produce them and it may not be easy to interpolate between them to drive new movement variants. In addition, these inputs depend on the desired time of output and therefore make it difficult to plan, e.g. in circumstances when timing should be decided depending on sensory signals. Finally, these inputs are specific to the full movement chain that will unfold, so they do not permit reuse of the inputs e.g. in movement sequences of different orders.

      Yes, that is a good point! We have incorporated further Discussion related to this point. We have additionally included a new example in which we regularize the temporal complexity of the inputs (see also our response to Reviewer 3's suggestion on this topic), which leads to more slowly varying inputs, and may indeed represent a more realistic constraint and lead to simpler inputs that can more easily be interpolated between. We also agree that uncertainty about the upcoming go cue may play an important role in the strategy adopted by the animals. While we have not performed an extensive investigation of the topic, we have included a Supplementary Figure (S9) in which we used Model Predictive Control to investigate the effect of planning under uncertainty about the go cue arrival time. We hope that this will give the reader a better sense of what sort of model extensions are possible within our framework.

      (2) Relatedly, if the motor circuits were to balance different types of objectives, the activity and inputs occurring before each movement may be broken down into different categories that may each specialize into one objective. For instance, previous work (Kaufman et al. eNeuron 2016, Iganaki et al., Cell 2022, Zimnik and Churchland, Nature Neuroscience 2021) has suggested that inputs occurring before the movement could be broken down into preparatory inputs 'stricto sensu' - relating to the planned characteristics of the movement - and a trigger signal, relating to the transition from planning to execution - irrespective of whether the movement is internally timed or triggered by an external event. The current work does not address which type(s) of early input may be labeled as 'preparatory' or may be thought of as a part of 'planning' computations.

      Yes, our model does indeed treat inputs in a very general way, and does not distinguish between the different types of processes they may be composed of. This is partly because we do not explicitly model where the inputs come from, such that our inputs likely englobe multiple processes. We have added discussion related to this point.

      (3) While the authors rightly point out some similarities between the inputs that they derive and observed preparatory activity in the brain, notably during motor sequences, there are also some differences. For instance, while both the derived inputs and the data show two peaks during sequences, the data reproduced from Zimnik and Churchland show preparatory inputs that have a very asymmetric shape that really plummets before the start of the next movement, whereas the derived inputs have larger amplitude during the movement period - especially for the second movement of the sequence. In addition, the data show trigger-like signals before each of the two reaches. Finally, while the data show a very high correlation between the pattern of preparatory activity of the second reach in the double reach and compound reach conditions, the derived inputs appear to be more different between the two conditions. Note that the data would be consistent with separate planning of the two reaches even in the compound reach condition, as well as the re-use of the preparatory input between the compound and double reach conditions. Therefore, different motor sequence datasets - notably, those that would show even more coarticulation between submovements - may be more promising to find a tight match between the data and the author's inputs. Further analyses in these datasets could help determine whether the coarticulation could be due to simple filtering by the circuits and muscles downstream of M1, planning of movements with adjusted curvature to mitigate the work performed by the muscles while permitting some amount of re-use across different sequences, or - as suggested by the authors - inputs fully tailored to one specific movement sequence that maximize accuracy and minimize the M1 input magnitude.

      Regarding the exact shape of the occupancy plots, it is important to note that some of the more qualitative aspects (e.g the relative height of the two peaks) will change if we change the parameters of the cost function. Right now, we have chosen the parameters to ensure that both reaches would be performed at roughly the same speed (as a way to very loosely constrain the parameters based on the observed behavior). However, small changes to the hyperparameters can lead to changes in the model output (e.g one of the two consecutive reaches being performed using greater acceleration than the other), and since our biophysical model is fairly simple, changes in the behavior are directly reflected in the network activity. Essentially, what this means is that while the double occupancy is a consistent feature of the model, the exact shape of the peaks is more sensitive to hyperparameters, and we do not wish to draw any strong conclusions from them, given the simplicity of the biophysical model. However, we do agree that our model exhibits some differences with the data. As discussed above, we have included additional discussion regarding the potential existence of separate inputs for planning vs triggering the movement in the context of single reaches.

      Overall, we are excited about the suggestions made by the Reviewer here about using our approach to analyze other motor sequence datasets, but we think that in order to do this properly, one would need to adopt a more realistic musculo-skeletal model (such as one provided by MotorNet).

      (4) Though iLQR is a powerful optimization method to find inputs optimizing the author's cost function, it also has some limitations. First, given that it relies on a linearization of the dynamics at each timestep, it has a limited ability to leverage potential advantages of nonlinearities in the dynamics. Second, the iLQR algorithm is not a biologically plausible learning rule and therefore it might be difficult for the brain to learn to produce the inputs that it finds. It remains unclear whether using alternative algorithms with different limitations - for instance, using variants of BPTT to train a separate RNN to produce the inputs in question - could impact some of the results.

      We agree that our choice of iLQR has limitations: while it offers the advantage of convergence guarantees, it does indeed restrict the choice of cost function and dynamics that we can use. We have now included extensive discussion of how the modeling choices affect our results.

      We do not view the lack of biological plausibility of iLQR as an issue, as the results are agnostic to the algorithm used for optimization. However, we agree that any structure imposed on the inputs (e.g by enforcing them to be the output of a self-contained dynamical system) would likely alter the results. A potentially interesting extension of our model would be to do just what the reviewer suggested, and try to learn a network that can generate the optimal inputs. However, this is outside the scope of our investigation, as it would then lead to new questions (e.g what brain region would that other RNN represent?).

      (5)  Under the objective considered by the authors, the amount of input occurring before the movement might be impacted by the presence of online sensory signals for closed-loop control. It is therefore an open question whether the objective and network characteristics suggested by the authors could also explain the presence of preparatory activity before e.g. grasping movements that are thought to be more sensory-driven (Meirhaeghe et al., Cell Reports 2023).

      It is true that we aren’t currently modeling sensory signals explicitly. However, some of the optimal inputs we infer may be capturing upstream information which could englobe some sensory information. This is currently unclear, and would likely depend on how exactly the model is specified. We have added new discussion to emphasize that our dynamics should not be understood as just representing M1, but more general circuits whose state can be decoded from M1.

      Reviewer #2 (Recommendations For The Authors):

      Additionally, thank you for pointing out various typos in the manuscript, we have fixed those!

      Reviewer 3:

      Thank you very much for your review, which makes a lot of very insightful points, and raises several interesting questions. In summary, we very much agree with the limitations you pointed out. In particular, the choice of input cost is something we had previously discussed, but we had found it challenging to decide on what a reasonable cost for “complexity” could be. Following your comment, we have however added a first attempt at penalizing “temporal complexity”, which shows promising behavior. We have only included those additional analyses as supplementary figures, and we have included new discussion, which hopefully highlights what we meant by the different model components, and how the model behavior may change as we vary some of our choices. We hope this can be informative for future models that may use a similar approach. Below, we highlight the changes that we have made to address your comments.

      The main limitation of the study is that it focuses exclusively on one specific constraint - magnitude - that could limit motor-cortex inputs. This isn't unreasonable, but other constraints are at least as likely, if less mathematically tractable. The basic results of this study will probably be robust with regard such issues - generally speaking, any constraint on what can be delivered during execution will favor the strategy of preparing - but this robustness cuts both ways. It isn't clear that the constraint used in the present study - minimizing upstream energy costs - is the one that really matters. Upstream areas are likely to be limited in a variety of ways, including the complexity of inputs they can deliver. Indeed, one generally assumes that there are things that motor cortex can do that upstream areas can't do, which is where the real limitations should come from. Yet in the interest of a tractable cost function, the authors have built a system where motor cortex actually doesn't do anything that couldn't be done equally well by its inputs. The system might actually be better off if motor cortex were removed. About the only thing that motor cortex appears to contribute is some amplification, which is 'good' from the standpoint of the cost function (inputs can be smaller) but hardly satisfying from a scientific standpoint.

      The use of a term that punishes the squared magnitude of control signals has a long history, both because it creates mathematical tractability and because it (somewhat) maps onto the idea that one should minimize the energy expended by muscles and the possibility of damaging them with large inputs. One could make a case that those things apply to neural activity as well, and while that isn't unreasonable, it is far from clear whether this is actually true (and if it were, why punish the square if you are concerned about ATP expenditure?). Even if neural activity magnitude an important cost, any costs should pertain not just to inputs but to motor cortex activity itself. I don't think the authors really wish to propose that squared input magnitude is the key thing to be regularized. Instead, this is simply an easily imposed constraint that is tractable and acts as a stand-in for other forms of regularization / other types of constraints. Put differently, if one could write down the 'true' cost function, it might contain a term related to squared magnitude, but other regularizing terms would by very likely to dominate. Using only squared magnitude is a reasonable way to get started, but there are also ways in which it appears to be limiting the results (see below).

      I would suggest that the study explore this topic a bit. Is it possible to use other forms of regularization? One appealing option is to constrain the complexity of inputs; a long-standing idea is that the role of motor cortex is to take relatively simple inputs and convert them to complex time-evolving inputs suitable for driving outputs. I realize that exploring this idea is not necessarily trivial. The right cost-function term is not clear (should it relate to low-dimensionality across conditions, or to smoothness across time?) and even if it were, it might not produce a convex cost function. Yet while exploring this possibility might be difficult, I think it is important for two reasons.

      First, this study is an elegant exploration of how preparation emerges due to constraints on inputs, but at present that exploration focuses exclusively on one constraint. Second, at present there are a variety of aspects of the model responses that appear somewhat unrealistic. I suspect most of these flow from the fact that while the magnitude of inputs is constrained, their complexity is not (they can control every motor cortex neuron at both low and high frequencies). Because inputs are not complexity-constrained, preparatory activity appears overly complex and never 'settles' into the plateaus that one often sees in data. To be fair, even in data these plateaus are often imperfect, but they are still a very noticeable feature in the response of many neurons. Furthermore, the top PCs usually contain a nice plateau. Yet we never get to see this in the present study. In part this is because the authors never simulate the situation of an unpredictable delay (more on this below) but it also seems to be because preparatory inputs are themselves strongly time-varying. More realistic forms of regularization would likely remedy this.

      That is a very good point, and it mirrors several concerns that we had in the past. While we did focus on the input norm for the sake of simplicity, and because it represents a very natural way to regularize our control solutions, we agree that a “complexity cost” may be better suited to models of brain circuits. We have addressed this in a supplementary investigation. We chose to focus on a cost that penalizes the temporal complexity of the inputs, as ||u(t+1) - u(t)||^2. Note that this required augmenting the state of the model, making the computations quite a bit slower; while it is doable if we only penalize the first temporal derivative, it would not scale well to higher orders.

      Interestingly, we did find that the activity in that setting was somewhat more realistic (see new Supplementary Figure S8), with more sustained inputs and plateauing activity. While we have kept the original model for most of the investigations, the somewhat more realistic nature of the results under that setting suggests that further exploration of penalties of that sort could represent a promising avenue to improve the model.

      We also found the idea of a cost that would ensure low-dimensionality of the inputs across conditions very interesting. However, it is challenging to investigate with iLQR as we perform the optimization separately for each condition; nevertheless, it could be investigated using a different optimizer.

      At present, it is also not clear whether preparation always occurs even with no delay. Given only magnitude-based regularization, it wouldn't necessarily have to be. The authors should perform a subspace-based analysis like that in Figure 6, but for different delay durations. I think it is critical to explore whether the model, like monkeys, uses preparation even for zero-delay trials. At present it might or might not. If not, it may be because of the lack of more realistic constraints on inputs. One might then either need to include more realistic constraints to induce zero-delay preparation, or propose that the brain basically never uses a zero delay (it always delays the internal go cue after the preparatory inputs) and that this is a mechanism separate from that being modeled.

      I agree with the authors that the present version of the model, where optimization knows the exact time of movement onset, produces a reasonably realistic timecourse of preparation when compared to data from self-paced movements. At the same time, most readers will want to see that the model can produce realistic looking preparatory activity when presented with an unpredictable delay. I realize this may be an optimization nightmare, but there are probably ways to trick the model into optimizing to move soon, but then forcing it to wait (which is actually what monkeys are probably doing). Doing so would allow the model to produce preparation under the circumstances where most studies have examined it. In some ways this is just window-dressing (showing people something in a format they are used to and can digest) but it is actually more than that, because it would show that the model can produce a reasonable plateau of sustained preparation. At present it isn't clear it can do this, for the reasons noted above. If it can't, regularizing complexity might help (and even if this can't be shown, it could be discussed).

      In summary, I found this to be a very strong study overall, with a conceptually timely message that was well-explained and nicely documented by thorough simulations. I think it is critical to perform the test, noted above, of examining preparatory subspace activity across a range of delay durations (including zero) to see whether preparation endures as it does empirically. I think the issue of a more realistic cost function is also important, both in terms of the conceptual message and in terms of inducing the model to produce more realistic activity. Conceptually it matters because I don't think the central message should be 'preparation reduces upstream ATP usage by allowing motor cortex to be an amplifier'. I think the central message the authors wish to convey is that constraints on inputs make preparation a good strategy. Many of those constraints likely relate to the fact that upstream areas can't do things that motor cortex can do (else you wouldn't need a motor cortex) and it would be good if regularization reflected that assumption. Furthermore, additional forms of regularization would likely improve the realism of model responses, in ways that matter both aesthetically and conceptually. Yet while I think this is an important issue, it is also a deep and tricky one, and I think the authors need considerable leeway in how they address it. Many of the cost-function terms one might want to use may be intractable. The authors may have to do what makes sense given technical limitations. If some things can't be done technically, they may need to be addressed in words or via some other sort of non-optimization-based simulation.

      Specific comments

      As noted above, it would be good to show that preparatory subspace activity occurs similarly across delay durations. It actually might not, at present. For a zero ms delay, the simple magnitude-based regularization may be insufficient to induce preparation. If so, then the authors would either have to argue that a zero delay is actually never used internally (which is a reasonable argument) or show that other forms of regularization can induce zero-delay preparation.

      Yes, that is a very interesting analysis to perform, which we had not considered before! When investigating this, we found that the zero-delay strategy does not rely on preparation in the same way as is seen in the monkeys. This seems to be a reflection of the fact that our “Go cue” corresponds to an “internal” go cue which would likely come after the true, “external go cue” – such that we would indeed never actually be in the zero delay setting. This is not something we had addressed (or really considered) before, although we had tried to ensure we referred to “delta prep” as the duration of the preparatory period but not necessarily the delay period. We have now included more discussion on this topic, as well as a new Supplementary Figure S10.

      I agree with the authors that prior modeling work was limited by assuming the inputs to M1, which meant that prior work couldn't address the deep issue (tackled here) of why there should be any preparatory inputs at all. At the same time, the ability to hand-select inputs did provide some advantages. A strong assumption of prior work is that the inputs are 'simple', such that motor cortex must perform meaningful computations to convert them to outputs. This matters because if inputs can be anything, then they can just be the final outputs themselves, and motor cortex would have no job to do. Thus, prior work tried to assume the simplest inputs possible to motor cortex that could still explain the data. Most likely this went too far in the 'simple' direction, yet aspects of the simplicity were important for endowing responses with realistic properties. One such property is a large condition-invariant response just before movement onset. This is a very robust aspect of the data, and is explained by the assumption of a simple trigger signal that conveys information about when to move but is otherwise invariant to condition. Note that this is an implicit form of regularization, and one very different from that used in the present study: the input is allowed to be large, but constrained to be simple. Preparatory inputs are similarly constrained to be simple in the sense that they carry only information about which condition should be executed, but otherwise have little temporal structure. Arguably this produces slightly too simple preparatory-period responses, but the present study appears to go too far in the opposite direction. I would suggest that the authors do what they can to address these issue via simulations and/or discussion. I think it is fine if the conclusion is that there exist many constraints that tend to favor preparation, and that regularizing magnitude is just one easy way of demonstrating that. Ideally, other constraints would be explored. But even if they can't be, there should be some discussion of what is missing - preparatory plateaus, a realistic condition-invariant signal tied to movement onset - under the present modeling assumptions.

      As described above, we have now included two additional figures. In the first one (S8, already discussed above), we used a temporal smoothness prior, and we indeed get slightly more realistic activity plateaus. In a second supplementary figure (S9), we have also considered using model predictive control (MPC) to optimize the inputs under an uncertain go cue arrival time. There, we found that removing the assumption that the delay period is known came with new challenges: in particular, it requires the specification of a “mental model” of when the Go cue will arrive. While it is reasonable to expect that monkeys will have a prior over the go time arrival cue that will be shaped by the design of the experiment, some assumptions must be made about the utility functions that should be used to weigh this prior. For instance, if we imagine that monkeys carry a model of the possible arrival time of the go cue that is updated online, they could nonetheless act differently based on this information, for instance by either preparing so as to be ready for the earliest go cue possible or alternatively to be ready for the average go cue. This will likely depend on the exact task design and reward/penalty structure. Here, we added simulations with those two cases (making simplifying assumptions to make the problem tractable/solvable using model predictive control), and found that the “earliest preparation” strategy gives rise to more realistic plateauing activity, while the model where planning is done for the “most likely go time” does not. We suspect that more realistic activity patterns could be obtained by e.g combining this framework with the temporal smoothness cost. However, the main point we wished to make with this new supplementary figure is that it is possible to model the task in a slightly more realistic way (although here it comes at the cost of additional model assumptions). We have now added more discussion related to those points. Note that we have kept our analyses on these new models to a minimum, as the main takeaway we wish to convey from them is that most components of the model could be modified/made more realistic. This would impact the qualitative behavior of the system and match to data but – in the examples we have so far considered – does not appear to modify the general strategy of networks relying on preparation.

      On line 161, and in a few other places, the authors cite prior work as arguing for "autonomous internal dynamics in M1". I think it is worth being careful here because most of that work specifically stated that the dynamics are likely not internal to M1, and presumably involve inter-area loops and (at some latency) sensory feedback. The real claim of such work is that one can observe most of the key state variables in M1, such that there are periods of time where the dynamics are reasonably approximated as autonomous from a mathematical standpoint. This means that you can estimate the state from M1, and then there is some function that predicts the future state. This formal definition of autonomous shouldn't be conflated with an anatomical definition.

      Yes, that is a good point, thank you for making it so clearly! Indeed, as previous work, we do not think of our “M1 dynamics” as being internal to M1, but they may instead include sensory feedback / inter-area loops, which we summarize into the connectivity, that we chose to have dynamics that qualitatively resemble data. We have now incorporated more discussion regarding what exactly the dynamics in our model represent.

      Round 2 of reviews

      Reviewer 3:

      My remaining comments largely pertain to some subtle (but to me important) nuances at a few locations in the text. These should be easy for the authors to address, in whatever way they see fit.

      Specific comments:

      (1) The authors state the following on line 56: "For preparatory processes to avoid triggering premature movement, any pre-movement activity in the motor and dorsal pre-motor (PMd) cortices must carefully exclude those pyramidal tract neurons."

      This constraint is overly restrictive. PT neurons absolutely can change their activity during preparation in principle (and appear to do so in practice). The key constraint is looser: those changes should have no net effect on the muscles. E.g., if d is the vector of changes in PT neuron firing rates, and b is the vector of weights, then the constraint is that b'd = 0. d = 0 is one good way of doing this, but only one. Half the d's could go up and half could go down. Or they all go up, but half the b's are negative. Put differently, there is no reason the null space has to be upstream of the PT neurons. It could be partly, or entirely, downstream. In the end, this doesn't change the point the authors are making. It is still the case that d has to be structured to avoid causing muscle activity, which raises exactly the point the authors care about: why risk this unless preparation brings benefits? However, this point can be made with a more accurate motivation. This matters, because people often think that a null-space is a tricky thing to engineer, when really it is quite natural. With enough neurons, preparing in the null space is quite simple.

      That is a good point – we have now reformulated this sentence to instead say “to avoid triggering premature movement, any pre-movement activity in the motor and dorsal premotor (PMd) cortices must engage the pyramidal tract neurons in a way that ensures their activity patterns will not lead to any movement”.

      (2) Line 167: 'near-autonomous internal dynamics in M1'.

      It would be good if such statements, early in the paper, could be modified to reflect the fact that the dynamics observed in M1 may depend on recurrence that is NOT purely internal to M1. A better phrase might be 'near-autonomous dynamics that can be observed in M1'. A similar point applies on line 13. This issue is handled very thoughtfully in the Discussion, starting on line 713. Obviously it is not sensible to also add multiple sentences making the same point early on. However, it is still worth phrasing things carefully, otherwise the reader may have the wrong impression up until the Discussion (i.e. they may think that both the authors, and prior studies, believe that all the relevant dynamics are internal to M1). If possible, it might also be worth adding one sentence, somewhere early, to keep readers from falling into this hole (and then being stuck there till the Discussion digs them out).

      That is a good point: we have now edited the text after line 170 to make it clear that the underlying dynamics may not be confined to M1, and have referenced the later discussion there.

      (3) The authors make the point, starting on line 815, that transient (but strong) preparatory activity empirically occurs without a delay. They note that their model will do this but only if 'no delay' means 'no external delay'. For their model to prepare, there still needs to be an internal delay between when the first inputs arrive and when movement generating inputs arrive.

      This is not only a reasonable assumption, but is something that does indeed occur empirically. This can be seen in Figure 8c of Lara et al. Similarly, Kaufman et al. 2016 noted that "the sudden change in the CIS [the movement triggering event] occurred well after (~150 ms) the visual go cue... (~60 ms latency)" Behavioral experiments have also argued that internal movement-triggering events tend to be quite sluggish relative to the earliest they could be, causing RTs to be longer than they should be (Haith et al. Independence of Movement Preparation and Movement Initiation). Given this empirical support, the authors might wish to add a sentence indicating that the data tend to justify their assumption that the internal delay (separating the earliest response to sensory events from the events that actually cause movement to begin) never shrinks to zero.

      While on this topic, the Haith and Krakauer paper mentioned above good to cite because it does ponder the question of whether preparation is really necessary. By showing that they could get RTs to shrink considerably before behavior became inaccurate, they showed that people normally (when not pressured) use more preparation time than they really need. Given Lara et al, we know that preparation does always occur, but Haith and Krakauer were quite right that it can be very brief. This helped -- along with neural results -- change our view of preparation from something more cognitive that had to occur, so something more mechanical that was simply a good network strategy, which is indeed the authors current point. Working a discussion of this into the current paper may or may not make sense, but if there is a place where it is easy to cite, it would be appropriate.

      This is a nice suggestion, and we thank the reviewer for pointing us to the Haith and Krakauer paper. We have now added this reference and extended the paragraph following line 815 to briefly discuss the possible decoupling between preparation and movement initiation that is shown in the Haith paper, emphasizing how this may affect the interpretation of the internal delay and comparisons with behavioral experiments.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      (1) Peptides were synthesized with fluorescein isothiocyanate (FITC) and Tat tag, and then PEGylated with methoxy PEG Succinimidyl Succinate.

      I have two concerns about the peptide design. First, FTIC was intended "for monitoring" (line 129), but was never used in the manuscript. Second, PEGylation targets the two lysine sidechains on the Tat, which would alter its penetration property.

      (1) We conducted an analysis of the cellular trafficking of FITC-tagged peptides following their permeabilization into cells.

      Author response image 1.

      However, we did not include it in the main text because it is a basic result.

      (2) As can be seen in the figure above, after pegylation and permeabilization, the cells were stained with FITC. It appears that this does not affect the ability to penetrate into the cells.

      (2) "Superdex 200 increase 10/300 GL column" (line 437) was used to isolate mono/di PEGylated PDZ and separate them from the residual PEG and PDZ peptide. "m-PEG-succinimidyl succinate with an average molecular weight of 5000 Da" (lines 133 and 134).

      To my knowledge, the Superdex 200 increase 10/300 GL column is not suitable and is unlikely to produce traces shown in Figure 1B.

      As Superdex 200 increase 10/300 GL featrues a fractionation range of 10,000 to 600,000 Da, we used it to fractionate PEGylated products including DiPEGylated PDZ (approx. 15 kDa) and MonoPEGylated PDZ (approx. 10 kDa) from residuals (PDZ and PEG), demonstrating successful isolation of PEGylated products (Figure 1C). Considering the molecular weights of PDZ and PEG are approximately 4.1 kDa and and 5.0 kDa, respectively, the late eluting peaks from SEC were likely to represent a mixed absorbance of PDZ and PEG at 215 nm.

      However, as the reviewer pointed out, it could be unreasonable to annotate peaks representing PDZ and PEG, respectively, from mixed absorbance detected in a region (11-12 min) beyond the fractionation range.

      In our revised manuscript, therefore, multiple peaks in the late eluting volume (11-12 min) were labeled as 'Residuals' all together. As a reference, the revised figure 1B includes a chromatogram of pure PDZ-WT under the same analytic condition.

      Therefore, we changed Fig.1B to new results.

      (3) "the in vivo survival effect of LPS and PDZ co-administration was examined in mice. The pretreatment with WT PDZ peptide significantly increased survival and rescued compared to LPS only; these effects were not observed with the mut PDZ peptide (Figure 2a)." (lines 159-160).

      Fig 2a is the weight curve only. The data is missing in the manuscript.

      We added the survived curve into Fig. 2A.

      (4) Table 1, peptide treatment on ALT and AST appears minor.

      In mice treated with LPS, levels of ALT and AGT in the blood are elevated, but these levels decrease upon treatment with WT PDZ. However, the use of mut PDZ does not result in significant changes. Figure 3A shows inflammatory cells within the central vein, yet no substantial hepatotoxicity is observed during the 5-day treatment with LPS. Normally, the ranges of ALT and AGT in C57BL6 mice are 16 ~ 200 U/L and 46 ~ 221 U/L, respectively, according to UCLA Diagnostic Labs. Therefore, the values in all experiments fall within these normal ranges. In summary, a 5-day treatment with LPS induces inflammation in the liver but is too short a duration to induce hepatotoxicity, resulting in lower values.

      (5) MitoTraker Green FM shouldn't produce red images in Figure 6.

      We changed new results (GREEN one) into Figs 6A and B.

      (6) Figure 5. Comparison of mRNA expression in PDZ-treated BEAS-2B cells. Needs a clearer and more detailed description both in the main text and figure legend. The current version is very hard to read.

      We changed Fig. 5A to new one to understand much easier and added more detailed results and figure legend.

      Results Section in Figure 5:

      we performed RNA sequencing analysis. The results of RNA-seq analysis showed the expression pattern of 24,424 genes according to each comparison combination, of which the results showed the similarity of 51 genes overlapping in 4 gene categories and the similarity between each comparison combination (Figure 5a). As a result, compared to the control group, it was confirmed that LPS alone, WT PDZ+LPS, and mut PDZ+LPS were all upregulated above the average value in each gene, and when LPS treatment alone was compared with WT PDZ+LPS, it was confirmed that they were averaged or downregulated. When comparing LPS treatment alone and mut PDZ+LPS, it was confirmed that about half of the genes were upregulated. Regarding the similarity between comparison combinations, the comparison combination with LPS…

      Figure 5 Legend Section:

      Figure 5. Comparison of mRNA expression in PDZ-treated BEAS-2B cells.

      BEAS-2B cells were treated with wild-type PDZ or mutant PDZ peptide for 24 h and then incubated with LPS for 2 h, after which RNA sequencing analysis was performed. (a) The heat map shows the general regulation pattern of about 51 inflammation-related genes that are differentially expressed when WT PDZ and mut PDZ are treated with LPS, an inflammatory substance. All samples are RED = upregulated and BLUE = downregulated relative to the gene average. Each row represents a gene, and the columns represent the values of the control group treated only with LPS and the WT PDZ and mut PDZ groups with LPS. This was used by converting each log value into a fold change value. All genes were adjusted to have the same mean and standard deviation, the unit of change is the standard deviation from the mean, and the color value range of each row is the same. (b) Significant genes were selected using Gene category chat (Fold change value of 2.00 and normalized data (log2) value of 4.00). The above pie chart shows the distribution of four gene categories when comparing LPS versus control, WT PDZ+LPS/LPS, and mut PDZ+LPS/LPS. The bar graph below shows RED=upregulated, GREEN=downregulated for each gene category, and shows the number of upregulated and downregulated genes in each gene category. (c) The protein-protein interaction network constructed by the STRING database differentially displays commonly occurring genes by comparing WT PDZ+LPS/LPS, mut PDZ+LPS/LPS, and LPS. These nodes represent proteins associated with inflammation, and these connecting lines denote interactions between two proteins. Different line thicknesses indicate types of evidence used in predicting the associations.

      Reviewer #2:

      (1) In this paper, the authors demonstrated the anti-inflammatory effect of PDZ peptide by inhibition of NF-kB signaling. Are there any results on the PDZ peptide-binding proteins (directly or indirectly) that can regulate LPS-induced inflammatory signaling pathway? Elucidation of the PDZ peptide-its binding partner protein and regulatory mechanisms will strengthen the author's hypothesis about the anti-inflammatory effects of PDZ peptide.

      As mentioned in the Discussion section, we believe it is crucial to identify proteins that directly interact with PDZ and regulate it. This direct interaction can modulate intracellular signaling pathways, so we plan to express GST-PDZ and induce binding with cellular lysates, then characterize it using the LC-Mass/Mass method. We intend to further research these findings and submit them for publication.

      (2) The authors presented interesting insights into the therapeutic role of the PDZ motif peptide of ZO-1. PDZ domains are protein-protein interaction modules found in a variety of species. It has been thought that many cellular and biological functions, especially those involving signal transduction complexes, are affected by PDZ-mediated interactions. What is the rationale for selecting the core sequence that regulates inflammation among the PDZ motifs of ZO-1 shown in Figure 1A?

      The rationale for selecting the core sequence that regulates inflammation among the PDZ motifs of ZO-1, as shown in Figure 1A, is grounded in the specific roles these motifs play in signal transduction pathways that are crucial for inflammatory processes. PDZ domains are recognized for their ability to function as scaffolding proteins that organize signal transduction complexes, crucial for modulating cellular and biological functions. The chosen core sequence is particularly important because it is conserved across ZO-1, ZO-2, and ZO-3, indicating a fundamental role in maintaining cellular integrity and signaling pathways. This conservation suggests that the sequence’s involvement in inflammatory regulation is not only significant in ZO-1 but also reflects a broader biological function across the ZO family.

      (3) In Figure 3, the authors showed the representative images of IHC, please add the quantification analysis of Iba1 expression and PAS-positive cells using Image J or other software. To help understand the figure, an indication is needed to distinguish specifically stained cells (for example, a dotted line or an arrow).

      We added the semi-quantitative results into Figs. 3d,e,f.

      Result section: The specific physiological mechanism by which WT PDZ peptide decreases LPS-induced systemic inflammation in mice and the signal molecules involved remain unclear. These were confirmed by a semi-quantitative analysis of Iba-1 immunoreactivity and PAS staining in liver, kidney, and lung,respectively (Figures 4d, e, and f). To examine whether WT PDZ peptide can alter LPS-induced tissue damage in the kidney, cell toxicity assay was performed (Figure 3g). LPS induced cell damage in the kidney, however, WT PDZ peptide could significantly alleviate the toxicity, but mut PDZ peptide could not. Because cytotoxicity caused by LPS is frequently due to ROS production in the kidney (Su et al., 2023; Qiongyue et al., 2022), ROS production in the mitochondria was investigated in renal mitochondria cells harvested from kidney tissue (Figure 3h)......

      Figure legend section: Indicated scale bars were 20 μm. (d,e,f) Semi-quantitative analysis of each are positive for Iba-1 in liver and kidney, and positive cells of PAS in lung, respectively. (g) After the kidneys were harvested, tissue lysates were used for MTT assay. (h) After.....

      (4) In Figure 6G, H, the authors confirmed the change in expression of the M2 markers by PDZ peptide using the mouse monocyte cell line Raw264.7. It would be good to add an experiment on changes in M1 and M2 markers caused by PDZ peptides in human monocyte cells (for example, THP-1).

      We thank you for your comments. To determine whether PDZ peptide regulates M1/M2 polarization in human monocytes, we examined changes in M1 and M2 gene expression in THP-1 cells. As a result, wild-type PDZ significantly suppressed the expression of M1 marker genes (hlL-1β, hIL-6, hIL-8, hTNF-ɑ), while increasing the expression of M2 marker genes (hlL-4, hIL-10, hMRC-1). However, mutant PDZ did not affect M1/M2 polarization. These results suggest that PDZ peptide can suppress inflammation by regulating M1/M2 polarization of human monocyte cells. These results are for the reviewer's reference only and will not be included in the main content.

      Author response image 2.

      Minor point:

      The use of language is appropriate, with good writing skills. Nevertheless, a thorough proofread would eliminate small mistakes such as:

      • line 254, " mut PDZ+LPS/LPS (45.75%) " → " mut PDZ+LPS/LPS (47.75%) "

      • line 296, " Figure 6f " → " Figure 6h "

      We changed these points into the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study presents a novel pipeline for the large-scale genomic prediction of members of the non-ribosomal peptide group of pyoverdines based on a dataset from nearly 2000 Pseudomonas genomes. The advance presented in this study is largely based on solid evidence, although some main claims are only incompletely supported. This study on bacterial siderophores has broad theoretical and practical implications beyond a singular subfield.

      Thank you for the supportive and encouraging words. We appreciate the editor’s and reviewers’ careful and professional assessment of this manuscript. The reviewers’ scrutiny has helped us to improve the presentation and discussion of our work. We have now carefully revised the manuscript following their instructive suggestions and comments. Please find below our detailed responses (marked in blue) to each of the comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript introduces a bioinformatic pipeline designed to enhance the structure prediction of pyoverdines, revealing an extensive and previously overlooked diversity in siderophores and receptors. Utilizing a combination of feature sequence and phylogenetic approaches, the method aims to address the challenging task of predicting structures based on dispersed gene clusters, particularly relevant for pyoverdines.

      Predicting structures based on gene clusters is still challenging, especially pyoverdines as the gene clusters are often spread to different locations in the genome. An improved method would indeed be highly useful, and the diversity of pyoverdine gene clusters and receptors identified is impressive.

      However, so far the method basically aligns the structural genes and domains involved in pyoverdine biosynthesis and then predicts A domain specificity to predict the encoded compounds. Both methods are not particularly new as they are included in other tools such as PRISM (10.1093/nar/gkx320) or Sandpuma (https://doi.org/10.1093/bioinformatics/btx400) among others. The study claims superiority in A domain prediction compared to existing tools, yet the support is currently limited, relying on a comparison solely with AntiSMASH. A more extensive and systematic comparison with other tools is needed.  

      Thanks for pointing this out. In the revised manuscript, we have included a comprehensive comparative analysis, in which we compared our pipeline to six different commonly used methods, including NP.searcher, PRISM4, AdenPredictor, SeMPI2, SANDPUMA, antiSMASH5 (see Supplementary_table 6 for details, and lines 281-286). These approaches either consist of a single specific algorithm or integrate several methods. Our approach performs best (see table below), demonstrating a clear improvement over previous tool. The improvements are due to several methodological differences inherent to our approach. Additionally, while exploring existing prediction tools, we found that some had not been maintained for years. For instance, we were unable to access NRPSsp (www.nrpssp.com) and NRPSpredictor2 (http://nrps.informatik.uni-tuebingen.de/). Below, we briefly explain these differences, particularly in relation to PRISM and SANDPUMA, as highlighted by the reviewer. 

      Author response table 1.

      PRISM annotates biosynthetic gene clusters (BGC) and reconstructs the linear structures of NRPS synthetases, with this function depending on proper annotations of open reading frames. This pipeline can have difficulties in assembling the linear structure into a final product. In our approach, we found that the annotations of NRPS gene are frequently truncated because of sequencing errors and annotation issues. Our method fixes this problem through rescanning all possible reading frames of the BGC to rebuild complete pyoverdine synthetase genes. 

      Sandpum and our approach are based on similar ideas (using the prediCAT algorithm) to predict A domain substrates, namely by using the closest reference A domain annotated. However, our method uses a self-adaptive feature extraction step to reduce the co-founding influence of phylogeny. This small adjustment significantly improves the performance of our approach and even works well for small training sets (101 experimentally validated A domains with our approach as opposed to 494 A domains used by Sandpuma from MIBiG).

      Additionally, in contradiction to the authors' claims, the method's applicability seems constrained to well-known and widely distributed gene clusters. The absence of predictions for new amino acids raises concerns about its generalizability to NRPS beyond the studied cases.

      We thank the reviewers for this comment. We acknowledge that our method cannot directly predict new amino acids. Nevertheless, for several reasons we believe that our approach is not constrained and can be widely applied in the future.

      First, our method can identify A domains that select new unknown amino acid substrates. In fact, three of the four unresolved cases in our experimental verification analysis (Fig. 3d) represent new amino acids. Obviously, experimental verification is required to characterize the unknown substrate. Once verified, the new A domains and their substrates can expand the reference dataset, allowing targeted improvement of our phylogeny-focused prediction technique. We now discuss this aspect in lines 634-645.

      Second, despite that the overall substrate diversity in NRPS is high across the microbial kingdom, our analysis suggests that the number of amino acids used for a specific group of secondary metabolites quickly reaches a saturation point. The discovery rate of new amino acids was 1.7% for our experimental Pseudomonas data set (Fig. 3d). The discovery rate of new amino acids was even 0.0 % for the Burkholderiales data set. This suggests that as the database expands, the discovery rate of novel amino acid substrates is expected to drop rapidly.

      Third, we acknowledge that the inability to predict the substrates of unknown domains is a common limitation among all knowledge-guided learning algorithms, including ours. However, we have made significant improvements in prediction accuracy. As the database grows, we expect the rate of unknown substrates to decrease, and the prediction accuracy to increase.

      The manuscript lacks clarity on how the alignment of structural genes operates when dealing with multiple NRPS gene clusters on different genome contigs. How would the alignment of each BGC work?

      We thank the reviewers for this comment. The pyoverdine molecules consist of a conserved fluorescent chromophore (Flu) and a peptide chain (Pep), both synthesized by NRPS enzymes. In most instances (over 90%), Flu and Pep are produced by two separate biosynthetic gene clusters (BGCs). In these cases, we merge the two BGCs by positioning Flu at the head and Pep at the tail. For the remaining less than 10%, there are two scenarios: 1. Flu and Pep are located on the same BGC, which eliminates any issues with BGC alignment. 2. In very rare cases, Flu and Pep are synthesized by three BGCs. Here, Flu is still synthesized by one BGC at the head, while Pep is produced by two BGCs. We put the BGC containing the Thioesterase (TE) domain as the tail and the BGC not containing the TE domain in the middle.

      (see lines 165-169).

      Another critical concern is that a main challenge in NRPS structure prediction is not the backbone prediction but rather the prediction of tailoring reactions, which is not addressed in the manuscript at all, and this limitation extensively restricts the applicability of the method.

      While we thank the reviewer for this comment, we only partly agree with it. Peptide backbone predictions are still a significant challenge. This challenge is clearly visible in our new analysis comparing prediction accuracies of different pipelines, such as antiSMASH5, PRISM4, AdenPredictor, SeMPI2, NP.searcher, Sandpuma. Unresolved and wrong substrate predictions are still common, highlighting the importance of our contribution in developing a new approach with improved high accuracy. 

      However, we agree with the reviewer that our current algorithm does not predict tailoring reactions (now discussed on lines 680-685). Although tailoring reactions are important for predicting the final NRPS product structure, none of the other existing pipelines address this issue either, and it remains a challenge for future work. For our study, it is important to note that the specificity of pyoverdines is primarily determined by the backbone composition, whereas tailoring reactions seem to play a minor role.

      The manuscript presents a potentially highly useful bioinformatic pipeline for pyoverdine structure prediction, showcasing a commendable exploration of siderophore diversity. However, some of the claims made remain unsubstantiated. Overall, while the study holds promise, further validation and refinement are required to fulfill its potential impact on the field of bioinformatic structure prediction.

      Thank you for the supportive and encouraging words. We deeply appreciate your constructive comments and suggestions. 

      Reviewer #2 (Public Review):

      Pyoverdines, siderophores produced by many Pseudomonads, are one of the most diverse groups of specialized metabolites and are frequently used as model systems. Thousands of Pseudomonas genomes are available, but large-scale analyses of pyoverdines are hampered by the biosynthetic gene clusters (BGCs) being spread across multiple genomic loci and existing tools' inability to accurately predict amino acid substrates of the biosynthetic adenylation (A) domains. The authors present a bioinformatics pipeline that identifies pyoverdine BGCs and predicts the A domain substrates with high accuracy. They tackled a second challenging problem by developing an algorithm to differentiate between outer membrane receptor selectivity for pyoverdines versus other siderophores and substrates. The authors applied their dataset to thousands of Pseudomonas strains, producing the first comprehensive overview of pyoverdines and their receptors and predicting many new structural variants.

      The A domain substrate prediction is impressive, including the correction of entries in the MIBiG database. Their high accuracy came from a relatively small training dataset of A domains from 13 pyoverdine BGCs. The authors acknowledge that this small dataset does not include all substrates, and correctly point out that new sequence/structure pairs can be added to the training set to refine the prediction algorithm. 

      The authors could have been more comprehensive in finding their training set data. For instance, the authors claim that histidine "had not been previously documented in pyoverdines", but the sequenced strain P. entomophila L48, incorporates His (10.1007/s10534-009-9247-y). 

      Thank you for highlighting this issue. We agree that stating histidine has not been reported before in pyoverdine was incorrect. We have reviewed the full text and made the necessary corrections.

      The primary reason for excluding the sequenced strains P. syringae 1448a (10.1186/14712180-11-218) and P. entomophila L48 (10.1007/s10534-009-9247-y) from the training set is that the pyoverdine structures of these strains were not determined solely through experimental methods. In these works, the pyoverdine structures were predicted based on the synthetic gene sequence using bioinformatical analysis, followed by structural analysis experiments based on this predicted structure. We found that pre-prediction probably has introduced biases into downstream analyses. Specifically, in the case of Pseudomonas entomophila L48, we discovered inaccuracies in the annotation of certain domains (see figures below). For example, the third A domain of the peptide chain in P. entomophila L48 pyoverdine was initially annotated with Dab specificity. However, upon closer examination, it appears to differ significantly from other Dab references (top) or Dab from our experimentally validated (right) domains (left panel in the figure below). By analyzing the interface (I) domain (10.1073/pnas.1903161116) in its predicted site, we suggested that it should actually recognize OHHis. The OHAsp domain of P. entomophila L48 reported in the paper is actually close in sequence similarity to the OHAsp domain (left panel in the figure below), while the Ala domain reported is more similar to the Ser domain (right panel in the figure below). For these reasons, we did not include this supervised pyoverdine structure analysis strain in the training set data.

      Author response image 1.

      The workflow cannot differentiate between different variants of Asp and OHOrn, and it's not clear if this is a limitation of the workflow, the training data, or both. 

      Thanks for pointing this out. It is generally challenging to differentiate between variants of the same amino acid (for all the algorithms existing to date). In this sense, it is a limitation of our but also of all other workflows. Nonetheless, we wish to stress that we observed feature sequence divergence (using the A motif4-5 region), which helped us to separate some (but not all) of the Asp and Orn variants. For example, separations between Asp-variants are distinct (left panel in the figure below). To be on the conservative side, we only differentiated between OHAsp and Asp for our predictions, but also differentiation between DOHAsp and OHAsp would be possible. In the case of Orn-variants, there was a clear separation between Orn and the OHOrn variants (right panel). In contrast, it was difficult to differentiate between the subgroups of OHOrn variants. We believe that no A domain prediction tool will be able to solve this issue. Instead, it would be important to include information on substrate-modifying enzymes in future approaches.

      Author response image 2.

      The prediction workflow holds up well in Burkholderiales A domains, however, they fail to mention in the main text that they achieved these numbers by adding more A domains to their training set.

      We thank the reviewers for this comment. We apologize for not having mentioned the training data set in the main text, while we described it in detail in the methods section (lines 714-732). We now provided more details on the analysis procedure in the main text (lines 307313). Important to note is that we did not add more A domains to the training data set but built up a new independent data set for Burkholderiales. The aim was to mirror the analysis we performed for pyoverdines with a completely new data set, featuring 124 A domains for training and 178 A domains as test set.

      To validate their predictions, they elucidated structures of several new pyoverdines, and their predictions performed well. However, the authors did not include their MS/MS data, making it impossible to validate their structures. In general, the biggest limitation of the submitted manuscript is the near-empty methods section, which does not include any experimental details for the 20 strains or details of the annotation pipeline (such as "Phydist" and "Syndist"). The source code also does not contain the requisite information to replicate the results or re-use the pipeline, such as the antiSMASH version and required flags. That said, skimming through the source code and data (kindly provided upon request) suggests that the workflow itself is sound and a clear improvement over existing tools for pyoverdine BGC annotation.

      Thank you for highlighting these issues. We agree that the methods section is short. This is because the entire paper is a step-by-step methodological introduction to our pipeline. We have now carefully revised the main text to add the information requested by the reviewer. Moreover, we have included a supplementary file with the MS/MS data of the experimentally analyzed pyoverdine structures. Finally, we further include a link to a one-click online notebook that can be used to replicate the annotation and substrate prediction results See: https://drive.google.com/drive/folders/1JsfyPUGDTFo8BDDZk8JLSvKry8emzMhr?usp=drive_ link , following a more detail explanation on code.

      Predicting outer membrane receptor specificity is likewise a challenging problem and the authors have made a promising achievement by finding specific gene regions that differentiate the pyoverdine receptor FpvA from FpvB and other receptor families. Their predictions were not tested experimentally, but the finding that only predicted FpvA receptors were proximate to the biosynthesis genes lends credence to the predictive power of the workflow. The authors find predicted pyoverdine receptors across an impressive 468 genera, an exciting finding for expanding the role of pyoverdines as public goods beyond Pseudomonas. However, whether or not these receptors can recognize pyoverdines (and if so, which structures!) remains to be investigated.

      Thank you for the supportive and encouraging words. The bioinformatic analysis and experimental testing of pyoverdine-receptor matching is complicated and it is not part of this paper. We treated it in a separate manuscript in which we developed an experimentally verified co-evolution algorithm that matches pyoverdines to receptors. With this algorithm, we can identify self-receptors (i.e. receptors used to take up the self-produced pyoverdine), and therefore establish pyoverdine sharing and interaction networks across strains in communities.

      Please see DOI:10.1101/2023.11.05.565711 for details.

      In all, the authors have assembled a rich dataset that will enable large-scale comparative genomic analyses. This dataset could be used by a variety of researchers, including those studying natural product evolution, public good eco/evo dynamics, and NRPS engineering.

      Thank you for the supportive and encouraging words. We are grateful for the reviewers’ instructive suggestions and comments.

      Reviewer #3 (Public Review):

      Summary:

      Secondary metabolites are produced by numerous microorganisms and have important ecological functions. A major problem is that neither the function of a secondary metabolite enzyme nor the resulting metabolite can be precisely predicted from gene sequence data.

      In the current paper, the authors addressed this highly relevant question.

      The authors developed a bioinformatic pipeline to reconstruct the complete secondary metabolism pathway of pyoverdines, a class of iron-scavenging siderophores produced by Pseudomonas spp. These secondary metabolites are biosynthesized by a series of nonribosomal peptide synthetases and require a specific receptor (FpvA) for uptake. The authors combined knowledge-guided learning with phylogeny-based methods to predict with high accuracy encoding NRPSs, substrate specificity of A domains, pyoverdine derivatives, and receptors. After validation, the authors tested their pipeline with sequence data from 1664 phylogenetically distinct Pseudomonas strains and were able to determine 18,292 enzymatic A domains involved in pyoverdine synthesis, reliably predicted 97.8% of their substrates, identified 188 different pyoverdine molecule structures and 4547 FpvA receptor variants belonging to 94 distinct groups. All the results and predictions were clearly superior to predictions that are based on antiSMASH. Novel pyoverdine structures were elucidated experimentally by UHPLC-HR-MS/MS.

      To assess the extendibility of the pipeline, the authors chose Burkholderiales as a test case which led to the results that the pipeline consistently maintains high prediction accuracy within Burkholderiales of 83% which was higher than for antiSMASH (67%).

      Together, the authors concluded that supervised learning based on a few known compounds produced by species from the same genus probably outperforms generalized prediction algorithms trained on many products from a diverse set of microbes for NRPS substrate predictions. As a result, they also show that both pyoverdine and receptor diversity have been vastly underestimated.

      Strengths:

      The authors developed a very useful bioinformatic pipeline with high accuracy for secondary metabolites, at least for pyoverdines. The pipelines have several advantages compared to existing pipelines like the extensively used antiSMASH program, e.g. it can be applied to draft genomes, shows reduced erroneous gene predictions, etc. The accuracy was impressively demonstrated by the discovery of novel pyoverdines whose structures were experimentally substantiated by UHPLC-HR-MS/MS.

      The manuscript is very well written, and the data and the description of the generation of pipelines are easy to follow.

      Weaknesses:

      The only major comment I have is the uncertainty of whether the pipeline can be applied to more complex non-ribosomal peptides. In the current study, the authors only applied their pipeline to a very narrow field, i.e., pyoverdines of Pseudomonas and Burkholderia strains.

      Thanks for your positive and encouraging comment. Regarding your only major comment, we think that the design concept of our pipeline has the potential to be applied to more complex non-ribosomal peptides. Currently, our method is tailored to accurately predict the structural composition of the Pseudomonas siderophore pyoverdine (see also response 3). A key point emphasized in our article is the importance of considering phylogeny in developing substrate prediction algorithms for A domains. Currently, the main challenge in advancing these algorithms is the limited availability of data on A domains and their corresponding substrates. However, with the future accumulation of more reference data, we are confident that the design principles of our method will enable precise predictions of the structural compositions of all products synthesized by non-ribosomal peptide synthetases (see our discussions in lines 634-

      645). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I believe that the manuscript would benefit from focusing solely on the task of improving pyoverdine predictions. This aspect alone is significant, and robustly supporting this claim would strengthen the manuscript. The diversity analysis provided is valuable and would undoubtedly benefit the scientific community. However, additional systematic comparisons with other methods are necessary. Furthermore, clarification of certain terms, such as 'featurebased' (e.g., whether it refers to NRPS domains or CDS), would enhance clarity.

      Thank you for the supportive and encouraging words. We followed the reviewer’s suggestion and now provide the requested method comparison, see also response 2 for details. Furthermore, we have carefully checked the main text to clarify terms whenever needed. Specifically, we now define the terms “feature sequence” and “feature sequence distance” in lines 227-229.  

      Additionally, several minor points could be improved upon:

      In line 85, clarification is needed on how pyoverdine genes were identified.

      Thank you for your thorough review. In the introduction section, we provided a brief overview of our work, while the detailed methodology is outlined in the results section on lines 160-174.

      In line 382, it would be helpful to know the source of the sequences.

      We agree and have now carefully revised the manuscript following your suggestions (lines 403-405).

      Line 392 could be explained more clearly. Does it mean that the authors used an hmm search to search pHMMs against each reference sequence?

      Thanks for your comment. Yes, we used an hmm search to search pHMMs against each reference sequence. We have now revised the manuscript to improve explanations (lines 413-418).

      Reviewer #2 (Recommendations For The Authors):

      The authors state they "elucidated the chemical structure of the 20 pyoverdines using culturebased methods combined with UHPLC-HR-MS/MS", so I was alarmed to see that KR and LB already published several of those structures in the cited paper. I hope that this "double dipping" will be fixed in a revision process.

      Thank you for pointing this out. We agree that we have not explained clearly enough what steps were conducted in this study and which data were used from a previous paper (https://doi.org/10.1007/s00216-022-03907-w). The genomes of the 20 strains used for the verification analysis (Fig. 3d) were sequenced as part of this study (access code now provided). 14 out of the 20 pyoverdine structures were elucidated with UHPLC-HR-MS/MS in this study. For 6 out of the 20 pyoverdines, we had structural information already at hand from the previous paper. We have now clarified these details in our manuscript (lines 276-280). 

      Thank you for providing the source code and data, and I hope that the final non-redundant dataset will be uploaded to Zenodo or another repository. Please deposit the 20 newlysequenced genomes to GenBank or another public repository. Please also show the UHPLC-

      HR-MS/MS data, preferably in the form of raw data uploaded to GNPS.

      We have followed the reviewer’s advice and deposited our data:

      - The sequences of the 20 newly sequenced strains are available on ENA accession PRJEB76792.

      - The MS/MS plots of the 14 newly analyzed pyoverdines are shown in the Supplementary Materials.

      - We provide a one-click online notebook to allow readers to replicate the pyoverdine cluster annotation and substrate prediction of the 20 experimentally analyzed strains.

      I suggest adding "at least" or a similar qualifier when the 73 variants are mentioned unless the literature search was truly exhaustive. What were the criteria for inclusion of the 13 strains in Table S2? For instance, sequenced strains P. syringae 1448a (10.1186/1471-2180-11-218) and P. entomophila L48 (10.1007/s10534-009-9247-y) were not included.

      Thank you for your comment. We have now carefully revised the manuscript following your suggestions (lines 291-295). Regarding the criteria for including the 13 strains in Table S2, we aimed to select strains with the high credibility for inclusion in the training set data. The primary reason for excluding the two strains from the training set is that their siderophore structures were analyzed through supervised experiments. We wanted to avoid any form of biases that bioinformatic pre-predictions could introduce to downstream analyses (see Response 13 for details).

      OHAsp in pyoverdines has been reported to arise from hydroxylation of Asp after it's already been activated by the A domain (10.1073/pnas.1903161116). Was there a clear difference between A domains that lead to Asp and OHAsp? Conversely, acetylation and formylation of OHOrn occur before adenylation. Can your workflow be used to differentiate cOHOrn, fOHOrn, and AcOHOrn, which are currently difficult to predict through genome mining?

      Thank you for these considerations. We treated these aspects in our response 8.  

      Throughout, define non-proteinogenic AA substrate abbreviations (ex: Rsc, Dab).

      Revised as per suggestion (lines 329-333).

      Additional line comments:

      189: Mention PhyloPhlAn in the main text.

      Revised as per suggestion (lines 189).

      191: Define these filtering/selection criteria.

      Thanks for your comment, we have added the criteria in the main text (line 196 and line 198). 

      309, 620: An A domain presumably loading histidine is present in sequenced strain P. entomophila L48 (10.1007/s10534-009-9247-y). Please also clarify that Val has previously been seen in a pyoverdine (it is in Table S1) albeit not sequenced.

      We have clarified these aspects as per suggestion (lines 314-315 and line 630).

      310: The pipeline can "highlight" new substrates, but not identify them.

      Revised as per suggestion (line 295).

      354: Please clarify "13 amino acid substrates form the core of all the 188 pyoverdine structures", considering that 279 A domain substrates couldn't be predicted.

      Thanks for your comments. We have now clarified “our analysis found that 13 amino acids form the main structural substrates of all the 188 pyoverdine structures.” (lines

      360-363)

      630: "discovered" implies that there is experimental evidence. I suggest something like "here we predicted 151 putatively new variants".

      Revised as per suggestion (line 648).

      Reviewer #3 (Recommendations For The Authors):

      Weakness:

      The only major comment I have is the uncertainty of whether the pipeline can be applied to more complex non-ribosomal peptides. In the current study, the authors only applied their pipeline to a very narrow field, i.e., pyoverdines of Pseudomonas and Burkholderia strains

      Thanks for your comment. Please see our Responses 3+13 above, where we treat this concern in detail. Moreover, we discussed the possibility of extension to other groups of secondary metabolites in our discussion. We believe that we deliver a balanced view on the applicability of our approach and the next steps to be taken.  

      Please comment on this aspect.

      Minor:

      (1)  When you speak about "synthesis" it is rather biosynthesis. Synthesis is chemical synthesis.

      Please replace all instances of the word synthesis with biosynthesis.

      Revised as per suggestion.

      (2)  Line 188: synthetase is rather synthetases

      Revised as per suggestion (line 191).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Point-by-point reply in response to the Reviewer’s comments

      Reviewer #1

      Public review:

      [1] (a) Given that only a fraction of the FAPs express BDNF after injury, the authors need to demonstrate the specificity of the Prrx1-Cre for FAPs. This is particularly important because muscle stem cell also express GDNF receptors (Fig. 3C & D) and myogenic progenitors/satellite cells produce BDNF after nerve injury (Griesbeck et al., 1995 (PMID 8531223); Omura et al., 2005 (PMID 16221288)). (b) Moreover, as the authors point out, there are multipotent mesenchymal precursor cells in the nerve that migrate into the surrounding tissue following nerve injury and contribute to regeneration (Carr et al, PMID 30503141). Therefore, there are multiple possible sources of BDNF, highlighting the need to clearly demonstrate that FAP-derived BDNF is essential.

      - (a) As the Reviewer noted, both GDNF receptor expression and increased BDNF expression in response to nerve injury are detectable in both FAPs and muscle stem cells (MuSCs). Therefore, we agree with the Reviewer that demonstrating the specificity of Prrx1-Cre in FAPs is crucial to support our claim. In our previous publication (Kim et al., 2022), using Prrx1-Cre; Rosa-eYFP mice, we showed that while most of the CD31-CD45-Vcam1-Sca1+ FAPs are eYFP+, CD31-CD45-Vcam1+Sca1- MuSCs do not express eYFP (Liu et al., 2015; Kim et al., 2022) (Attached Figure 1). Additionally, genomic DNA PCR using mononuclear cells sorted from our Prrx1Cre; Bdnffl/fl mice showed that DNA recombination in the floxed Bdnf gene could only be detected in FAPs and CD31-CD45-Vcam1-Sca1- cells, but not in MuSCs (Author response image 2). This is consistent with a previous report that showed Prrx1-Cre activity in FAPs, pericytes, vascular smooth muscle cells (vSMCs) and tenocytes (Leinroth et al.,

      2022), where pericytes, vSMCs and tenocytes are included the CD31-CD45-Vcam1Sca1- population (Giordani et al., 2019). Together, these results demonstrate that while Prrx1-Cre is active in FAPs, it is absent in MuSCs.

      Author response image 1.

      Expression of eYFP in muscle-resident, lineage-negative, live mononuclear cells isolated from Prrx1Cre;RosaeYFP mice. Supplemental Figure 3A from Kim et al., 2022. Lin-: lineage-negative (CD31-CD45-); Neg.: Vcam1-Sca1-.

      Author response image 2.

      Recombination of the floxed Bdnf gene in the mononuclear cells sorted from muscles of Prrx1Cre; Bdnffl/fl or Bdnffl/fl mice. Genotypes and cell types sampled for each lane is specified. P4, P5, and P6 indicate primers used for each PCR. Lin+: lineage(CD31/CD45)-positive; DN: CD31-CD45-Vcam1-Sca1-.

      - (b) We appreciate and agree with the Reviewer’s comment that additional experiments are needed to confirm that FAP-derived BDNF is indeed essential for nerve regeneration, considering other potential cellular sources of BDNF, such as nerve-resident mesenchymal precursor cells. One possible experiment that could demonstrate the requirement of FAP-derived BDNF in nerve regeneration would be the transplantation of wild-type FAPs into our Prrx1Cre; Bdnf fl/fl mice and to see if the delay in nerve regeneration and remyelination is recovered, making the process similar to that in control mice. Unfortunately, since the genetic background of our Prrx1Cre; Bdnffl/fl mice is a mixture of B6, 129S4, and BALB/c, immune rejection of the transplanted cells may occur, which makes the experiment technically difficult. Another experimental approach could involve the use of FAP-specific Cre mouse line, as we have mentioned in the Discussion of our original manuscript. However, such a line does not yet exist due to the lack of a marker gene that is expressed specifically in FAPs, but not in nerve-resident mesenchymal precursor cells. Overcoming such technical challenges and demonstrating the requirement of FAP-derived BDNF in nerve regeneration would significantly strengthen our report, though we regret that these methods are currently unavailable.

      [2] Similarly, the authors should provide some evidence that BDNF protein is produced by FAPs. All of their data for BDNF expression is based on mRNA expression and that appears to only be increased in a small subset of FAPs. Perhaps an immunostaining could be done to demonstrate up-regulation of BDNF in FAPs after injury.

      - We appreciate the Reviewer’s constructive comment. To demonstrate that BDNF protein is produced by FAPs upon nerve injury, we performed western blot analysis. FAPs were isolated from either sciatic nerve crush injury-affected muscles at 7 days post injury (dpi) or from the contralateral, uninjured muscles, and protein samples were prepared for SDS-PAGE and western blot using anti-BDNF, anti-PDGFRα and antiGAPDH antibodies. As a result, while both nerve injury-affected and uninjured musclederived FAPs expressed PDGFRα, the mature from of BDNF protein was only detected in nerve injury-affected FAPs, showing that BDNF is indeed expressed in FAPs at the protein level after injury. We have added this new result as Figure 4F in the New Figure 4 with the experimental scheme as New Figure 4—figure supplement 1, and revised the Results section (lines 364-374) and the Materials and Methods section (lines 687-705) in our manuscript to include the new results in detail.

      [3] The suggestion that Schwann cell-derived GDNF is responsible for upregulation of BDNF in the FAPs is indirect, based largely on the data showing that injection of GDNF into the muscle is sufficient to up-regulate BDNF (Fig. 4F & G). However, to more directly connect the 2 observations in a causal way, the authors should inject a Ret/GDNF antagonist, such as a Ret-Fc construct, then measure the BDNF levels.

      - We appreciate the Reviewer’s constructive comment, and we agree that testing the necessity of GDNF/RET signaling in BDNF upregulation is crucial to link the expression of the two neurotrophic factors in a causal way. As a means to antagonize GDNF/RET signaling, we injected anti-GDNF antibodies into the tibialis anterior and gastrocnemius muscles following sciatic nerve crush injury to block the activity of intramuscular GDNF protein. As a result, although the differences were not statistically significant, we observed a tendancy towards decreased Bdnf mRNA expression upon anti-GDNF injection compared to IgG controls. We have added this new result as New Figure 4—figure supplement 2, and revised our manuscript to include the details in both the Results section (lines 381-390) and the Materials and Methods section (lines 611-616). We have also changed the title of New Figure 4 (line 332) to encompass the new results. We are aware that further experiments that may involve increasing the number of animals tested, increasing the antibody injection dosage or frequency, or implementation of genetic models such as Plp1CreER; Gdnffl/fl should be carried out to validate our hypothesis with statistical significance. Unfortunately, due to limited time, resources, and research funds, we were unable to perform such additional experiments. We hope that the Reviewer understands these limitations.

      [4] (a) In assessing the regeneration after nerve crush, the authors focus on remyelination, for example, assessing CMAP and g-ratios. However, they should also quantify axon regeneration, which can be done distal to the crush injury at earlier time points, before the 6 weeks scored in their study. Evaluating axon regeneration, which occurs prior to remyelination, would be especially useful because BDNF can act on both Schwann cells, to promote myelination, and axons, enhancing survival and growth. (b) They could also evaluate the stability of the neuromuscular junctions, particularly if a denervation was done with the conditional knock outs, although that may be a bit beyond the scope of this study.

      - (a) As the Reviewer mentioned, BDNF is known to act on both Schwann cells and axons, where it promotes myelination and axonal growth, respectively (Oudega and

      Hagg, 1998; Zhang et al., 2000; Chan et al., 2001; Xiao et al., 2009; English et al.,

      2013). We fully agree with the Reviewer’s comment that quantification of axon regeneration, which could be achieved through immunostaining of the distal part of the sciatic nerve at earlier time points after injury, would shed light on whether FAPderived BDNF can also contribute to axon regeneration in addition to remyelination. Unfortunately, we could not perform such additional experiments within the limited time frame, since preparing enough numbers of control and conditional knockout mice that match the age groups used in this study (3-4 months old), followed by waiting for additional 2-4 weeks after nerve crush injury for sample collection, and subsequent immunostaining for quantification could take almost 6 months in total. We hope that the Reviewer understands this limitation.

      - (b) We appreciate the Reviewer’s constructive comment. Although the number of animals used for neuromuscular junction (NMJ) analyses was not sufficient, we had briefly examined the structure of NMJs at 4 weeks post nerve crush injury in control (Ctrl) and conditional knockout (cKO) mice as a preliminary experiment. As a result, no significant differences were observed between Ctrl and cKO mice in terms of NMJ morphology and innervation (Author response image 3). 

      Author response image 3.

      Structures of neuromuscular junctions from Ctrl vs cKO mice at 4 weeks post nerve crush injury. Whole-mount immunostaining was done using the exterior digitorum longus muscles that were affected by sciatic nerve crush injury. Samples were stained with α-bungarotoxin (green), neurofilament (red), and synaptophysin (blue). Scale bar: 50 μm. 

      Going back to part (a) of this Reviewer’s comment, considering the data presented in Author response image 3, where innervation of axons into acetylcholine receptor clusters was not significantly different between Ctrl versus cKO mice, FAP-derived BDNF may not be critical for the axonal growth upon nerve injury. Although we acknowledge that additional experiments are required to draw a meaningful conclusion on this point, we could not perform such additional experiments due to insufficient time and resources.

      We hope that the Reviewer understands our limitation.

      Recommendations for the authors:

      [1] In citing the ability of BDNF to promote Schwann cell myelination the authors should include Chan et al., 2001 (PMID 11717413) in addition to the Zhang et al, 2000 and Xiao et al, 2009 references.

      - We apologize for missing out the reference mentioned by the Reviewer. We have added the suggested reference in our revised manuscript (lines 395, 425, and 517).

      Reviewer #2

      Public review:

      [1] Although, I find the data the authors generated enough for their claims. I do see them as relatively poor, and (a) a complementary analysis of protein expression would strengthen the paper through immunostaining of the different genes mentioned for FAPs and Schwann cells. The model is entirely supported by measuring mRNA levels and negative regulation of gene expression in specific cells. Additionally, (b) what happens to the structure of the neuromuscular junction after regeneration when GDNF or BDNF expression is reduced? (c) The determination of decreasing levels of FAPs BDNF mRNA during aging is interesting; is the gain of BDNF expression in FAPs reverting the phenotype?

      - (a) We appreciate and agree with the Reviewer’s comment that validation of BDNF protein expression in FAPs and GDNF protein expression in Schwann cells upon nerve injury would strengthen this paper. Regarding GDNF protein expression in Schwann cells upon nerve injury, it has already been demonstrated by previous studies (Höke et al., 2002; Xu et al., 2013). For BDNF protein expression in FAPs upon nerve injury, we performed western blot analysis for validation, as mentioned in the response to Reviewer #1 Public review [2]. The results showed that while the mature form of BDNF protein could not be readily detected in FAPs isolated from uninjured muscles, it could be detected in FAPs isolated from sciatic nerve crush injury-affected muscles at 7 days post injury. We have added the new result as Figure 4F in the New Figure 4 with the experimental scheme as New Figure 4—figure supplement 1, and revised the Results section (lines 364-374) and the Materials and Methods section (lines 687-705) in our manuscript to include the new results in detail.

      - (b) Though the data is preliminary, we examined the structures of neuromuscular junctions (NMJs) from control and Prrx1Cre; Bdnf fl/fl mice at 4 weeks post injury in the exterior digitorum longus muscles, as mentioned in the response to Reviewer #1 Publilc review [4](b). As a result, we could not identify significant differences between control versus Prrx1Cre; Bdnf fl/fl mice, where BDNF expression is reduced specifically in Prrx1-expressing cells, including FAPs (Attached Figure 3). Since other cellular sources of BDNF, such as Schwann cells, exist, regeneration of the NMJs may not have been as significantly affected as remyelination in our Prrx1Cre; Bdnf fl/fl mice. However, further experiments with a sufficient number of mice and more observation time points are required to statistically validate this hypothesis in detail. Unfortunately, preparing samples for such additional analyses would take more than four months, as we need to produce sufficient numbers of control and Prrx1Cre; Bdnf fl/fl mice that match the age groups used in this study. We hope that the Reviewer understands our limitation.

      Regarding analyzing NMJ structures after regeneration affected by reduced GDNF levels, using genetic models such as Plp1CreER; Gdnffl/fl mice would be appropriate, as we have used the Prrx1Cre; Bdnffl/fl mice in this study to reduce BDNF levels produced by FAPs. Unfortunately, we do not have the Gdnffl mice, and obtaining these mice to produce Plp1CreER; Gdnffl/fl mice and performing the additional experiment would take too much time for this current revision. In a further study, we will try to perform the additional experiment by obtaining the required mouse line. We hope that the Reviewer understands our limitation.

      - (c) We appreciate the Reviewer for highlighting this point. In this paper, we have shown that BDNF expression upon nerve injury is decreased in aged FAPs compared to young adult FAPs, and suggested that this may be one of the causes of the delayed nerve regeneration phenotype in aged mice. Previously, it has been reported that while intramuscular injection of BDNF accelerates nerve regeneration, intramuscular injection of anti-BDNF antibodies delays the regeneration process (Zheng et al., 2016). This implies that intramuscular levels of active BDNF can significantly influence the speed of nerve regeneration. Therefore, the gain of BDNF expression in aged FAPs may contribute to reversing the delayed nerve regeneration phenotype in aged mice, since it would result in additional supply of active, intramuscular BDNF, which has previously been shown to accelerate nerve regeneration. Though experimental validation is required to support such claim, we could not obtain sufficient numbers of aged mice within the limited time frame. We hope that the Reviewer understands our limitation.

      Recommendations for the authors:

      [1] The authors should include the experimental design and several drawings in the leading figures indicating, for example, how remyelination after injury was quantified and how the response of regenerated sciatic nerve to a depolarizing stimulus was studied.

      - We apologize for any confusion caused by insufficient information provided in the leading figures. Unfortunately, due to limited space, we could not add experimental designs or drawings in the leading figures. Instead, to do our best to comply with the

      Reviewer’s comment, we have revised the figure legends in the leading figures so that the experimental designs or diagrams can be referred to in the figure supplements.

      We hope that the Reviewer understands this limitation.

      Reviewer #3

      Public review:

      [1] In Fig. 1 and 2 authors provide data on scRNA seq and this is important information reporting the finding of RET and GFRa1 transcripts in the subpopulation of FAP cells. However, authors provide no data on the expression of RET and GFRa1 proteins in FAP cells.

      - Reply for this comment by the Reviewer is in the Recommendations for the authors section below ([2]), as the same comment is repeated.

      [2] Another problem is the lack of information showing that GDNF secreted by Schwann cells can activate RET and its down-stream signaling in FAP cells. There is no direct experimental proof that GDNF activating GFRa1-RET signaling triggers BDNF upregulation In FAP cells. The data that GDNF signaling is inducing the synthesis and secretion of BDNF is also not conclusive.

      - Reply for this comment by the Reviewer is in the Recommendations for the authors section below ([3]), as the same comment is repeated.

      Recommendations for the authors:

      [1] Although this is a novel study and contains very well-performed parts, the GDNF section is preliminary and requires additional experimentation. In the introduction authors describe well FAPs but even do not mention how GDNF is signaling. Moreover, the reader may get an impression that Ras-MAPK pathway is the only or at least the main GDNF signaling pathway. In fact, for neurons Akt and Src signaling pathways play also crucial role.

      - We apologize for the missing content in the Introduction section of our manuscript and for any confusion caused by our misleading description of the GDNF signaling pathway. We have revised our manuscript to include the GDNF signaling pathway in the Introduction section, along with a description of other downstream signaling pathways of GDNF that are known to play crucial roles, as mentioned by the Reviewer (lines 115-130). Additionally, we changed the expression in the Results section to avoid making any misleading impressions (lines 318-319).

      [2] In Fig. 1 and 2 authors provide data on scRNA seq and this is important information reporting the finding of RET and GFRa1 transcripts in the subpopulation of FAP cells. However, authors provide no data on the expression of RET and GFRa1 proteins in FAP cells.

      - We appreciate the Reviewer for the constructive comment. Though we fully agree with the Reviewer that validating the expression of RET and GFRα1 proteins in FAPs is needed, we were unable to obtain the antibodies required for such experiments within the limited time frame for this revision. We hope that the Reviewer understands our limitation. Although we could not directly show the expression of those GDNF receptor genes at the protein level in FAPs, based on the result where intramuscular GDNF injection could sufficiently induce Bdnf expression in FAPs compared to PBS control in the absence of nerve damage, it is likely that GDNF receptors are indeed expressed at the protein level in FAPs, since if otherwise, FAPs would not have been able to respond to the injected GDNF protein. Nevertheless, in a future study, we will try to validate the protein-level expression of GDNF receptors in FAPs to comply with the Reviewer’s suggestion and to further support this study.

      [3] Another problem is the lack of information showing that GDNF secreted by Schwann cells can activate RET and its down-stream signaling in FAP cells. Authors can monitor activation of MAPK pathway by detecting phospho-Erk and PI3 kinase-Akt pathway measuring phospho-S6 using immunohistochemistry. We can recommend to use the following antibodies: pErk1/2 (1:300, Cell Signaling, Cat# 4370L RRID:AB_2297462), pS6 (1:300, Cell Signaling, Cat# 4858L RRID:AB_1031194). These experiments are crucial because RET and GFRa1 proteins maybe not expressed at the sufficient level on the cell surface.

      - We sincerely appreciate the Reviewer’s constructive comment. In this study, we suggested that the GDNF-BDNF axis within FAPs would signal through the MAPK pathway based on the bioinformatic analysis of our single cell RNA-seq data and matching the results with the previously known pathways. We fully agree that monitoring the activation of the MAPK pathway and the PI3K-Akt pathway by immunohistochemistry would experimentally demostrate whether GDNF can activate those pathways within FAPs through GFRα1/RET activation. Unfortunately, we could not obtain the antibodies suggested by the Reviewer for this revision due to insufficient research funds and limited time frame. We hope that the Reviewer understands our limitation. In future studies, we will try to validate the detailed molecular pathway that mediates the GDNF-BDNF axis in FAPs by incorporating the methodology suggested by the Reviewer, along with implementation of genetic models such as Plp1CreER; Gdnffl/fl, Prrx1Cre; Retfl/fl or Prrx1Cre; Gfra1fl/fl to validate whether Schwann cell-derived

      GDNF can actually signal through its canonical receptor RET/GFRα1 expressed in FAPs to induce expression of BDNF upon nerve injury.

      [4] (a) There is no direct experimental proof that GDNF activating GFRa1-RET signaling triggers BDNF upregulation in FAP cells. Authors can use GDNF blocking antibodies, siRNA or use RET or GFRa1 cKO mice to delete them from FAP cells. (b) The data that GDNF signaling is inducing the synthesis and secretion of BDNF is also not conclusive. Authors should show that GDNF injection is increasing BDNF protein levels in FAPs. To get sufficient material for ELISA detection of BDNF is perhaps problematic. However, authors can use BDNF antibodies from Icosagen company and use IHC.

      - (a) We appreciate the Reviewer for the critical comment. As mentioned in the reply for Reviewer #1 Public review [3], we used GDNF blocking antibodies to reduce GDNF signaling within the tibialis anterior and gastrocnemius muscles by intramuscular injection after sciatic nerve crush injury, and included the result as a new figure supplement in our revised manuscript (New Figure 4—figure supplement 2) with its details in both the Results section (lines 381-390) and the Materials and Methods section (lines 611-616). Though the results were not statistically significant, intramuscular injection of anti-GDNF antibodies showed a tendency toward reduced Bdnf expression in FAPs, compared to IgG controls. As mentioned in the reply for Reviewer #1 Public review [3], and as suggested by the Reviewer, using cKO mice such as Plp1CreER; Gdnffl/fl, Prrx1Cre; Retfl/fl, or Prrx1Cre; Gfra1fl/fl mice would further validate the GDNF-BDNF axis suggested in this study, likely with statistical significance. Unfortunately, obtaining these genetic models within the limited time frame of this current revision is not feasible. We will try to adopt such models in our future study to validate the role of Schwann cell-derived GDNF in inducing BDNF expression in FAPs via activation of RET/GFRα1.  

      - (b) We appreciate the Reviewer for the constructive comment. Though we fully agree that the experiment suggested by the Reviewer would validate the synthesis and secretion of BDNF protein by GDNF signaling in FAPs, we were not able to perform it due to lack of research funds to obtain enough amount of the GDNF protein. We hope that the Reviewer understands our limitation. Still, combining the results from New Figure 4H in this study with the New Figure 4F, where GDNF injection induced Bdnf mRNA expression in FAPs, and BDNF protein expression in FAPs in response to nerve injury was demonstrated via western blot, we anticipate that GDNF injection would increase BDNF protein levels in FAPs, though direct validation of this statement would require conducting the additional experiments mentioned by the Reviewer.

      References

      Chan JR, Cosgaya JM, Wu YJ, and Shooter EM (2001). Neurotrophins are key mediators of the myelination program in the peripheral nervous system. Proceedings of the National Academy of Sciences 98:14661-14668.

      English AW, Liu K, Nicolini JM, Mulligan AM, and Ye K (2013). Small-molecule trkB agonists promote axon regeneration in cut peripheral nerves. Proc Natl Acad Sci U S A 110:16217-22.10.1073/pnas.1303646110

      Giordani L, He GJ, Negroni E, Sakai H, Law JY, Siu MM, Wan R, Corneau A, Tajbakhsh S, and Cheung TH (2019). High-dimensional single-cell cartography reveals novel skeletal muscle-resident cell populations. Molecular Cell 74:609-621. e6.

      Höke A, Gordon T, Zochodne D, and Sulaiman O (2002). A decline in glial cell-linederived neurotrophic factor expression is associated with impaired regeneration after long-term Schwann cell denervation. Experimental neurology 173:77-85.

      Kim J-H, Kang J-S, Yoo K, Jeong J, Park I, Park JH, Rhee J, Jeon S, Jo Y-W, and Hann S-H (2022). Bap1/SMN axis in Dpp4+ skeletal muscle mesenchymal cells regulates the neuromuscular system. JCI Insight 7:

      Leinroth AP, Mirando AJ, Rouse D, Kobayahsi Y, Tata PR, Rueckert HE, Liao Y, Long JT, Chakkalakal JV, and Hilton MJ (2022). Identification of distinct non-myogenic skeletal-muscle-resident mesenchymal cell populations. Cell Reports 39:

      Liu L, Cheung TH, Charville GW, and Rando TA (2015). Isolation of skeletal muscle stem cells by fluorescence-activated cell sorting. Nature protocols 10:1612-1624.

      Oudega M, and Hagg T (1998). Neurotrophins promote regeneration of sensory axons in the adult rat spinal cord. Brain Research 818:431-438.10.1016/S0006-8993(98)01314-6

      Xiao J, Wong AW, Willingham MM, Kaasinen SK, Hendry IA, Howitt J, Putz U, Barrett GL, Kilpatrick TJ, and Murray SS (2009). BDNF exerts contrasting effects on peripheral myelination of NGF-dependent and BDNF-dependent DRG neurons. J Neurosci 29:4016-22.10.1523/JNEUROSCI.3811-08.2009

      Xu P, Rosen KM, Hedstrom K, Rey O, Guha S, Hart C, and Corfas G (2013). Nerve injury induces glial cell linederived neurotrophic factor (gdnf) expression in schwann cells through purinergic signaling and the pkcpkd pathway. Glia 61:1029-1040.

      Zhang JY, Luo XG, Xian CJ, Liu ZH, and Zhou XF (2000). Endogenous BDNF is required for myelination and regeneration of injured sciatic nerve in rodents. European Journal of Neuroscience 12:4171-4180.10.1111/j.1460-9568.2000.01312.x

      Zheng J, Sun J, Lu X, Zhao P, Li K, and Li L (2016). BDNF promotes the axonal regrowth after sciatic nerve crush through intrinsic neuronal capability upregulation and distal portion protection. Neuroscience letters 621:1-8.

    1. Author Response:

      The following is the authors’ response to the previous reviews.

      We carefully read through the second-round reviews and the additional reviews. To us, the review process is somewhat unusual and very much dominated by referee 2, who aggressively insists that we mixed up the trigeminal nucleus and inferior olive and that as a consequence our results are meaningless. We think the stance of referee 2 and the focus on one single issue (the alleged mix-up of trigeminal nucleus and inferior olive) is somewhat unfortunate, leaves out much of our findings and we debated at length on how to deal with further revisions. In the end, we decided to again give priority to addressing the criticism of referees 2, because it is hard to go on with a heavily attacked paper without resolving the matter at stake. The following is a summary of, what we did:

      Additional experimental work:

      (1) We checked if the peripherin-antibody indeed reliably identifies climbing fibers.

      To this end, we sectioned the elephant cerebellum and stained sections with the peripherin-antibody. We find: (i) the cerebellar white matter is strongly reactive for peripherin-antibodies, (ii) cerebellar peripherin-antibody staining of has an axonal appearance. (iii) Cerebellar Purkinje cell somata appear to be ensheated by peripherin-antibody staining. (iv) We observed that the peripherin-antibody reactivity gradually decreases from Purkinje cell somata to the pia in the cerebellar molecular layer. This work is shown in our revised Figure 2. All these four features align with the distribution of climbing fibers (which arrive through the white matter, are axons, ensheat Purkinje cell somata, and innervate Purkinje cell proximally not reaching the pia). In line with previous work, which showed similar cerebellar staining patterns in several species (Errante et al. 1998), we conclude that elephant climbing fibers are strongly reactive for peripherin-antibodies.

      (2) We delineated the elephant olivo-cerebellar tract.

      The strong peripherin-antibody reactivity of elephant climbing fibers enabled us to delineate the elephant olivo-cerebellar tract. We find the elephant olivo-cerebellar tract is a strongly peripherin-antibody reactive, well-delineated fiber tract several millimeters wide and about a centimeter in height. The unstained olivo-cerebellar tract has a greyish appearance. In the anterior regions of the olivo-cerebellar tract, we find that peripherin-antibody reactive fibers run in the dorsolateral brainstem and approach the cerebellar peduncle, where the tract gradually diminishes in size, presumably because climbing fibers discharge into the peduncle. Indeed, peripherin-antibody reactive fibers can be seen entering the cerebellar peduncle. Towards the posterior end of the peduncle, the olivo-cerebellar disappears (in the dorsal brainstem directly below the peduncle. We note that the olivo-cerebellar tract was referred to as the spinal trigeminal tract by Maseko et al. 2013. We think the tract in question cannot be the spinal trigeminal tract for two reasons: (i) This tract is the sole brainstem source of peripherin-positive climbing fibers entering the peduncle/ the cerebellum; this is the defining characteristic of the olivo-cerebellar tract. (ii) The tract in question is much smaller than the trigeminal nerve, disappears posterior to where the trigeminal nerve enters the brainstem (see below), and has no continuity with the trigeminal nerve; the continuity with the trigeminal nerve is the defining characteristic of the spinal trigeminal tract, however.

      The anterior regions of the elephant olivo-cerebellar tract are similar to the anterior regions of olivo-cerebellar tract of other mammals in its dorsolateral position and the relation to the cerebellar peduncle. In its more posterior parts, the elephant olivo-cerebellar tract continues for a long distance (~1.5 cm) in roughly the same dorsolateral position and enters the serrated nucleus that we previously identified as the elephant inferior olive. The more posterior parts of the elephant olivo-cerebellar tract therefore differ from the more posterior parts of the olivo-cerebellar tract of other mammals, which follows a ventromedial trajectory towards a ventromedially situated inferior olive. The implication of our delineation of the elephant olivo-cerebellar tract is that we correctly identified the elephant inferior olive.

      (3) An in-depth analysis of peripherin-antibody reactivity also indicates that the trigeminal nucleus receives no climbing fiber input.

      We also studied the peripherin-antibody reactivity in and around the trigeminal nucleus. We had also noted in the previous submission that the trigeminal nucleus is weakly positive for peripherin, but that the staining pattern is uniform and not the type of axon bundle pattern that is seen in the inferior olive of other mammals. To us, this observation already argued against the presence of climbing fibers in the trigeminal nucleus. We also noted that the myelin stripes of the trigeminal nucleus were peripherin-antibody-negative. In the context of our olivo-cerebellar tract tracing we now also scrutinized the surroundings of the trigeminal nucleus for peripherin-antibody reactivity. We find that the ventral brainstem surrounding the trigeminal nucleus is devoid of peripherin-antibody reactivity. Accordingly, no climbing fibers, (which we have shown to be strongly peripherin-antibody-positive, see our point 1) arrive at the trigeminal nucleus. The absence of climbing fiber input indicates that previous work that identified the (trigeminal) nucleus as the inferior olive (Maseko et al 2013) is unlikely to be correct.

      (4) We characterized the entry of the trigeminal nerve into the elephant brain.

      To better understand how trigeminal information enters the elephant’s brain, we characterized the entry of the trigeminal nerve. This analysis indicated to us that the trigeminal nerve is not continuous with the olivo-cerebellar tract (the spinal trigeminal tract of Maseko et al. 2013) as previously claimed by Maseko et al. 2013. We show some of this evidence in Referee-Figure 1 below. The reason we think the trigeminal nerve is discontinuous with the olivo-cerebellar tract is the size discrepancy between the two structures. We first show this for the tracing data of Maseko et al. 2013. In the Maseko et al. 2013 data the trigeminal nerve (Referee-Figure 1A, their plate Y) has 3-4 times the diameter of the olivocerebellar tract (the alleged spinal trigeminal tract, Referee-Figure 1B, their plate Z). Note that most if not all trigeminal fibers are thought to continue from the nerve into the trigeminal tract (see our rat data below). We plotted the diameter of the trigeminal nerve and diameter of the olivo-cerebellar (the spinal trigeminal tract according to Maseko et al. 2013) from the Maseko et al. 2013 data (Referee-Figure 1C) and we found that the olivocerebellar tract has a fairly consistent diameter (46 ± 9 mm2, mean ± SD). Statistical considerations and anatomical evidence suggest that the tracing of the trigeminal nerve into the olivo-cerebellar (the spinal trigeminal tract according to Maseko et al. 2013) is almost certainly wrong. The most anterior point of the alleged spinal trigeminal tract has a diameter of 51 mm2 which is more than 15 standard deviations different from the most posterior diameter (194 mm2) of the trigeminal tract. For this assignment to be correct three-quarters of trigeminal nerve fibers would have to spontaneously disappear, something that does not happen in the brain. We also made similar observations in the African elephant Bibi, where the trigeminal nerve (Referee-Figure 1D) is much larger in diameter than the olivocerebellar tract (Referee-Figure 1E). We could also show that the olivocerebellar tract disappears into the peduncle posterior to where the trigeminal nerve enters (Referee-Figure 1F). Our data are very similar to Maseko et al. indicating that their outlining of structures was done correctly. What appears to have been oversimplified, is the assignment of structures as continuous. We also quantified the diameter of the trigeminal nerve and the spinal trigeminal tract in rats (from the Paxinos & Watson atlas; Referee-Figure 1D); as expected we found the trigeminal nerve and spinal trigeminal tract diameters are essentially continuous.

      In our hands, the trigeminal nerve does not continue into a well-defined tract that could be traced after its entry. In this regard, it differs both from the olivo-cerebellar tract of the elephant or the spinal trigeminal tract of the rodent, both of which are well delineated. We think the absence of a well-delineated spinal trigeminal tract in elephants might have contributed to the putative tracing error highlighted in our Referee-Figure 1A-C.

      We conclude that a size mismatch indicates trigeminal fibers do not run in the olivo-cerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013).

      Author response image 1.

      The trigeminal nerve is discontinuous with the olivo-cerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013)

      A, Trigeminal nerve (orange) in the brain of African elephant LAX as delineated by Maseko et al. 2013 (coronal section; their plate Y).

      B, Most anterior appearance of the spinal trigeminal tract of Maseko et al. 2013 (blue; coronal section; their plate Z). Note the much smaller diameter of the spinal trigeminal tract compared to the trigeminal nerve shown in C, which argues against the continuity of the two structures. Indeed, our peripherin-antibody staining showed that the spinal trigeminal tract of Maseko corresponds to the olivo-cerebellar tract and is discontinuous with the trigeminal nerve.

      C, Plot of the trigeminal nerve and olivo-cerebellar tracts (the spinal trigeminal tract according to Maseko et al. 2013) diameter along the anterior-posterior axis. The trigeminal nerve is much larger in diameter than the olivocerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013). C, D measurements, for which sections are shown in panels C and D respectively. The olivocerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013) has a consistent diameter; data replotted from Maseko et al. 2013. At mm 25 the inferior olive appears.

      D, Trigeminal nerve entry in the brain of African elephant Bibi; our data, coronal section, the trigeminal nerve is outlined in orange, note the large diameter.

      E, Most anterior appearance of the olivo-cerebellar tract in the brain of African elephant Bibi; our data, coronal section, approximately 3 mm posterior to the section shown in A, the olivocerebellar tract is outlined in blue. Note the smaller diameter of the olivo-cerebellar tract compared to the trigeminal nerve, which argues against the continuity of the two structures.

      F, Plot of the trigeminal nerve and olivo-cerebellar tract diameter along the anterior-posterior axis. The nerve and olivo-cerebellar tract are discontinuous and the trigeminal nerve is much larger in diameter than the olivocerebellar tract (the spinal trigeminal tract according to Maseko et al. 2013); our data. D, E measurements, for which sections are shown in panels D and E respectively. At mm 27 the inferior olive appears.

      G, In the rat the trigeminal nerve is continuous in size with the spinal trigeminal tract. Data replotted from Paxinos and Watson.

      Reviewer 2 (Public Review):

      As indicated in my previous review of this manuscript (see above), it is my opinion that the authors have misidentified, and indeed switched, the inferior olivary nuclear complex (IO) and the trigeminal nuclear complex (Vsens). It is this specific point only that I will address in this second review, as this is the crucial aspect of this paper - if the identification of these nuclear complexes in the elephant brainstem by the authors is incorrect, the remainder of the paper does not have any scientific validity.

      Comment: We agree with the referee that it is most important to sort out, the inferior olivary nuclear complex (IO) and the trigeminal nuclear complex, respectively.Change: We did additional experimental work to resolve this matter as detailed at the beginning of our response. Specifically, we ascertained that elephant climbing fibers are strongly peripherin-positive. Based on elephant climbing fiber peripherin-reactivity we delineated the elephant olivo-cerebellar tract. We find that the olivo-cerebellar connects to the structure we refer to as inferior olive to the cerebellum (the referee refers to this structure as the trigeminal nuclear complex). We also found that the trigeminal nucleus (the structure the referee refers to as inferior olive) appears to receive no climbing fibers. We provide indications that the tracing of the trigeminal nerve into the olivo-cerebellar tract by Maseko et al. 2023 was erroneous (Author response image 1). These novel findings support our ideas but are very difficult to reconcile with the referee’s partitioning scheme.

      The authors, in their response to my initial review, claim that I "bend" the comparative evidence against them. They further claim that as all other mammalian species exhibit a "serrated" appearance of the inferior olive, and as the elephant does not exhibit this appearance, that what was previously identified as the inferior olive is actually the trigeminal nucleus and vice versa. 

      For convenience, I will refer to IOM and VsensM as the identification of these structures according to Maseko et al (2013) and other authors and will use IOR and VsensR to refer to the identification forwarded in the study under review. <br /> The IOM/VsensR certainly does not have a serrated appearance in elephants. Indeed, from the plates supplied by the authors in response (Referee Fig. 2), the cytochrome oxidase image supplied and the image from Maseko et al (2013) shows a very similar appearance. There is no doubt that the authors are identifying structures that closely correspond to those provided by Maseko et al (2013). It is solely a contrast in what these nuclear complexes are called and the functional sequelae of the identification of these complexes (are they related to the trunk sensation or movement controlled by the cerebellum?) that is under debate.

      Elephants are part of the Afrotheria, thus the most relevant comparative data to resolve this issue will be the identification of these nuclei in other Afrotherian species. Below I provide images of these nuclear complexes, labelled in the standard nomenclature, across several Afrotherian species. 

      (A) Lesser hedgehog tenrec (Echinops telfairi) 

      Tenrecs brains are the most intensively studied of the Afrotherian brains, these extensive neuroanatomical studies undertaken primarily by Heinz Künzle. Below I append images (coronal sections stained with cresol violet) of the IO and Vsens (labelled in the standard mammalian manner) in the lesser hedgehog tenrec. It should be clear that the inferior olive is located in the ventral midline of the rostral medulla oblongata (just like the rat) and that this nucleus is not distinctly serrated. The Vsens is located in the lateral aspect of the medulla skirted laterally by the spinal trigeminal tract (Sp5). These images and the labels indicating structures correlate precisely with that provide by Künzle (1997, 10.1016, see his Figure 1K,L. Thus, in the first case of a related species, there is no serrated appearance of the inferior olive, the location of the inferior olive is confirmed through connectivity with the superior colliculus (a standard connection in mammals) by Künzle (1997), and the location of Vsens is what is considered to be typical for mammals. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report. 

      Peer Review Image 1.

      (B) Giant otter shrew (Potomogale velox) 

      The otter shrews are close relatives of the Tenrecs. Below I append images of cresyl violet (left column) and myelin (right column) stained coronal sections through the brainstem with the IO, Vsens and Sp5 labelled as per standard mammalian anatomy. Here we see hints of the serration of the IO as defined by the authors, but we also see many myelin stripes across the IO. Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report.

      Peer Response Image 2.

      (C) Four-toed sengi (Petrodromus tetradactylus) 

      The sengis are close relatives of the Tenrecs and otter shrews, these three groups being part of the Afroinsectiphilia, a distinct branch of the Afrotheria. Below I append images of cresyl violet (left column) and myelin (right column) stained coronal sections through the brainstem with the IO, Vsens and Sp5 labelled as per standard mammalian anatomy. Here we see vague hints of the serration of the IO (as defined by the authors), and we also see many myelin stripes across the IO. Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report. 

      Peer Response Image 3.

      (D) Rock hyrax (Procavia capensis) 

      The hyraxes, along with the sirens and elephants form the Paenungulata branch of the Afrotheria. Below I append images of cresyl violet (left column) and myelin (right column) stained coronal sections through the brainstem with the IO, Vsens and Sp5 labelled as per the standard mammalian anatomy. Here we see hints of the serration of the IO (as defined by the authors), but we also see evidence of a more "bulbous" appearance of subnuclei of the IO (particularly the principal nucleus), and we also see many myelin stripes across the IO. Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report. 

      Peer Review Image 4.

      (E) West Indian manatee (Trichechus manatus) 

      The sirens are the closest extant relatives of the elephants in the Afrotheria. Below I append images of cresyl violet (top) and myelin (bottom) stained coronal sections (taken from the University of Wisconsin-Madison Brain Collection, https://brainmuseum.org, and while quite low in magnification they do reveal the structures under debate) through the brainstem with the IO, Vsens and Sp5 labelled as per standard mammalian anatomy. Here we see the serration of the IO (as defined by the authors). Vsens is located laterally and skirted by the Sp5. This is in agreement with the authors, as they propose that ONLY the elephants show the variations they report.

      Peer Review Image 5.

      These comparisons and the structural identification, with which the authors agree as they only distinguish the elephants from the other Afrotheria, demonstrate that the appearance of the IO can be quite variable across mammalian species, including those with a close phylogenetic affinity to the elephants. Not all mammal species possess a "serrated" appearance of the IO. Thus, it is more than just theoretically possible that the IO of the elephant appears as described prior to this study. 

      So what about elephants? Below I append a series of images from coronal sections through the African elephant brainstem stained for Nissl, myelin, and immunostained for calretinin. These sections are labelled according to standard mammalian nomenclature. In these complete sections of the elephant brainstem, we do not see a serrated appearance of the IOM (as described previously and in the current study by the authors). Rather the principal nucleus of the IOM appears to be bulbous in nature. In the current study, no image of myelin staining in the IOM/VsensR is provided by the authors. However, in the images I provide, we do see the reported myelin stripes in all stains - agreement between the authors and reviewer on this point. The higher magnification image to the bottom left of the plate shows one of the IOM/VsensR myelin stripes immunostained for calretinin, and within the myelin stripes axons immunopositive for calretinin are seen (labelled with an arrow). The climbing fibres of the elephant cerebellar cortex are similarly calretinin immunopositive (10.1159/000345565). In contrast, although not shown at high magnification, the fibres forming the Sp5 in the elephant (in the Maseko description, unnamed in the description of the authors) show no immunoreactivity to calretinin. 

      Peer Review Image 6.

      Comment: We appreciate the referee’s additional comments. We concede the possibility that some relatives of elephants have a less serrated inferior olive than most other mammals. We maintain, however, that the elephant inferior olive (our Figure 1J) has the serrated appearance seen in the vast majority of mammals.

      Change: None.

      Peripherin Immunostaining 

      In their revised manuscript the authors present immunostaining of peripherin in the elephant brainstem. This is an important addition (although it does replace the only staining of myelin provided by the authors which is unusual as the word myelin is in the title of the paper) as peripherin is known to specifically label peripheral nerves. In addition, as pointed out by the authors, peripherin also immunostains climbing fibres (Errante et al., 1998). The understanding of this staining is important in determining the identification of the IO and Vsens in the elephant, although it is not ideal for this task as there is some ambiguity. Errante and colleagues (1998; Fig. 1) show that climbing fibres are peripherin-immunopositive in the rat. But what the authors do not evaluate is the extensive peripherin staining in the rat Sp5 in the same paper (Errante et al, 1998, Fig. 2). The image provided by the authors of their peripherin immunostaining (their new Figure 2) shows what I would call the Sp5 of the elephant to be strongly peripherin immunoreactive, just like the rat shown in Errant et al (1998), and more over in the precise position of the rat Sp5! This makes sense as this is where the axons subserving the "extraordinary" tactile sensitivity of the elephant trunk would be found (in the standard model of mammalian brainstem anatomy). Interestingly, the peripherin immunostaining in the elephant is clearly lamellated...this coincides precisely with the description of the trigeminal sensory nuclei in the elephant by Maskeo et al (2013) as pointed out by the authors in their rebuttal. Errante et al (1998) also point out peripherin immunostaining in the inferior olive, but according to the authors this is only "weakly present" in the elephant IOM/VsensR. This latter point is crucial. Surely if the elephant has an extraordinary sensory innervation from the trunk, with 400 000 axons entering the brain, the VsensR/IOM should be highly peripherin-immunopositive, including the myelinated axon bundles?! In this sense, the authors argue against their own interpretation - either the elephant trunk is not a highly sensitive tactile organ, or the VsensR is not the trigeminal nuclei it is supposed to be. 

      Comment: We made sure that elephant climbing fibers are strongly peripherin-positive (our revised Figure 2). As we noted in already our previous ms, we see weak diffuse peripherin-reactivity in the trigeminal nucleus (the inferior olive according to the referee), but no peripherin-reactive axon bundles (i.e. climbing fibers) that are seen in the inferior olive of other species. We also see no peripherin-reactive axon bundles (i.e. the olivo-cerebellar tract) arriving in the trigeminal nucleus as the tissue surrounding the trigeminal nucleus is devoid of peripherin-reactivity. Again, this finding is incompatible with the referee’s ideas. As far as we can tell, the trigeminal fibers are not reactive for peripherin in the elephant, i.e. we did not observe peripherin-reactivity very close to the nerve entry, but unfortunately, we did not stain for peripherin-reactivity into the nerve. As the referee alludes to the absence of peripherin-reactivity in the trigeminal tract is a difference between rodents and elephants.

      Change: Our novel Figure 2.

      Summary: 

      (1) Comparative data of species closely related to elephants (Afrotherians) demonstrates that not all mammals exhibit the "serrated" appearance of the principal nucleus of the inferior olive. 

      (2) The location of the IO and Vsens as reported in the current study (IOR and VsensR) would require a significant, and unprecedented, rearrangement of the brainstem in the elephants independently. I argue that the underlying molecular and genetic changes required to achieve this would be so extreme that it would lead to lethal phenotypes. Arguing that the "switcheroo" of the IO and Vsens does occur in the elephant (and no other mammals) and thus doesn't lead to lethal phenotypes is a circular argument that cannot be substantiated. 

      (3) Myelin stripes in the subnuclei of the inferior olivary nuclear complex are seen across all related mammals as shown above. Thus, the observation made in the elephant by the authors in what they call the VsensR, is similar to that seen in the IO of related mammals, especially when the IO takes on a more bulbous appearance. These myelin stripes are the origin of the olivocerebellar pathway, and are indeed calretinin immunopositive in the elephant as I show. 

      (4) What the authors see aligns perfectly with what has been described previously, the only difference being the names that nuclear complexes are being called. But identifying these nuclei is important, as any functional sequelae, as extensively discussed by the authors, is entirely dependent upon accurately identifying these nuclei. 

      (4) The peripherin immunostaining scores an own goal - if peripherin is marking peripheral nerves (as the authors and I believe it is), then why is the VsensR/IOM only "weakly positive" for this stain? This either means that the "extraordinary" tactile sensitivity of the elephant trunk is non-existent, or that the authors have misinterpreted this staining. That there is extensive staining in the fibre pathway dorsal and lateral to the IOR (which I call the spinal trigeminal tract), supports the idea that the authors have misinterpreted their peripherin immunostaining.

      (5) Evolutionary expediency. The authors argue that what they report is an expedient way in which to modify the organisation of the brainstem in the elephant to accommodate the "extraordinary" tactile sensitivity. I disagree. As pointed out in my first review, the elephant cerebellum is very large and comprised of huge numbers of morphologically complex neurons. The inferior olivary nuclei in all mammals studied in detail to date, give rise to the climbing fibres that terminate on the Purkinje cells of the cerebellar cortex. It is more parsimonious to argue that, in alignment with the expansion of the elephant cerebellum (for motor control of the trunk), the inferior olivary nuclei (specifically the principal nucleus) have had additional neurons added to accommodate this cerebellar expansion. Such an addition of neurons to the principal nucleus of the inferior olive could readily lead to the loss of the serrated appearance of the principal nucleus of the inferior olive, and would require far less modifications in the developmental genetic program that forms these nuclei. This type of quantitative change appears to be the primary way in which structures are altered in the mammalian brainstem. 

      Comment: We still disagree with the referee. We note that our conclusions rest on the analysis of 8 elephant brainstems, which we sectioned in three planes and stained with a variety of metabolic and antibody stains and in which assigned two structures (the inferior olive and the trigeminal nucleus). Most of the evidence cited by the referee stems from a single paper, in which 147 structures were identified based on the analysis of a single brainstem sectioned in one plane and stained with a limited set of antibodies. Our synopsis of the evidence is the following.

      (1) We agree with the referee that concerning brainstem position our scheme of a ventromedial trigeminal nucleus and a dorsolateral inferior olive deviates from the usual mammalian position of these nuclei (i.e. a dorsolateral trigeminal nucleus and a ventromedial inferior olive).

      (2) Cytoarchitectonics support our partitioning scheme. The compact cellular appearance of our ventromedial trigeminal nucleus is characteristic of trigeminal nuclei. The serrated appearance of our dorsolateral inferior olive is characteristic of the mammalian inferior olive; we acknowledge that the referee claims exceptions here. To our knowledge, nobody has described a mammalian trigeminal nucleus with a serrated appearance (which would apply to the elephant in case the trigeminal nucleus is situated dorsolaterally).

      (3) Metabolic staining (Cyto-chrome-oxidase reactivity) supports our partitioning scheme. Specifically, our ventromedial trigeminal nucleus shows intense Cyto-chrome-oxidase reactivity as it is seen in the trigeminal nuclei of trigeminal tactile experts.

      (4) Isomorphism. The myelin stripes on our ventromedial trigeminal nucleus are isomorphic to trunk wrinkles. Isomorphism is a characteristic of somatosensory brain structures (barrel, barrelettes, nose-stripes, etc) and we know of no case, where such isomorphism was misleading.

      (5) The large-scale organization of our ventromedial trigeminal nuclei in anterior-posterior repeats is characteristic of the mammalian trigeminal nuclei. To our knowledge, no such organization has ever been reported for the inferior olive.

      (6) Connectivity analysis supports our partitioning scheme. According to our delineation of the elephant olivo-cerebellar tract, our dorsolateral inferior olive is connected via peripherin-positive climbing fibers to the cerebellum. In contrast, our ventromedial trigeminal nucleus (the referee’s inferior olive) is not connected via climbing fibers to the cerebellum.

      Change: As discussed, we advanced further evidence in this revision. Our partitioning scheme (a ventromedial trigeminal nucleus and a dorsolateral inferior olive) is better supported by data and makes more sense than the referee’s suggestion (a dorsolateral trigeminal nucleus and a ventromedial inferior olive). It should be published.

      Reviewer #3 (Public Review):

      Summary: 

      The study claims to investigate trunk representations in elephant trigeminal nuclei located in the brainstem. The researchers identify large protrusions visible from the ventral surface of the brainstem, which they examined using a range of histological methods. However, this ventral location is usually where the inferior olivary complex is found, which challenges the author's assertions about the nucleus under analysis. They find that this brainstem nucleus of elephants contains repeating modules, with a focus on the anterior and largest unit which they define as the putative nucleus principalis trunk module of the trigeminal. The nucleus exhibits low neuron density, with glia outnumbering neurons significantly. The study also utilizes synchrotron X-ray phase contrast tomography to suggest that myelin-stripe-axons traverse this module. The analysis maps myelin-rich stripes in several specimens and concludes that based on their number and patterning that they likely correspond with trunk folds; however this conclusion is not well supported if the nucleus has been misidentified. 

      Comment: The referee provides a summary of our work. The referee also notes that the correct identification of the trigeminal nucleus is critical to the message of our paper.

      Change: In line with these assessments we focused our revision efforts on the issue of trigeminal nucleus identification, please see our introductory comments and our response to Referee 2.

      Strengths: 

      The strength of this research lies in its comprehensive use of various anatomical methods, including Nissl staining, myelin staining, Golgi staining, cytochrome oxidase labeling, and synchrotron X-ray phase contrast tomography. The inclusion of quantitative data on cell numbers and sizes, dendritic orientation and morphology, and blood vessel density across the nucleus adds a quantitative dimension. Furthermore, the research is commendable for its high-quality and abundant images and figures, effectively illustrating the anatomy under investigation.

      Comment: We appreciate this positive assessment.

      Change: None

      Weaknesses: 

      While the research provides potentially valuable insights if revised to focus on the structure that appears to be inferior olivary nucleus, there are certain additional weaknesses that warrant further consideration. First, the suggestion that myelin stripes solely serve to separate sensory or motor modules rather than functioning as an "axonal supply system" lacks substantial support due to the absence of information about the neuronal origins and the termination targets of the axons. Postmortem fixed brain tissue limits the ability to trace full axon projections. While the study acknowledges these limitations, it is important to exercise caution in drawing conclusions about the precise role of myelin stripes without a more comprehensive understanding of their neural connections. 

      Comment: We understand these criticisms and the need for cautious interpretation. As we noted previously, we think that the Elife-publishing scheme, where critical referee commentary is published along with our ms, will make this contribution particularly valuable.

      Change: Our additional efforts to secure the correct identification of the trigeminal nucleus.

      Second, the quantification presented in the study lacks comparison to other species or other relevant variables within the elephant specimens (i.e., whole brain or brainstem volume). The absence of comparative data to different species limits the ability to fully evaluate the significance of the findings. Comparative analyses could provide a broader context for understanding whether the observed features are unique to elephants or more common across species. This limitation in comparative data hinders a more comprehensive assessment of the implications of the research within the broader field of neuroanatomy. Furthermore, the quantitative comparisons between African and Asian elephant specimens should include some measure of overall brain size as a covariate in the analyses. Addressing these weaknesses would enable a richer interpretation of the study's findings. 

      Comment: We understand, why the referee asks for additional comparative data, which would make our study more meaningful. We note that we already published a quantitative comparison of African and Asian elephant facial nuclei (Kaufmann et al. 2022). The quantitative differences between African and Asian elephant facial nuclei are similar in magnitude to what we observed here for the trigeminal nucleus, i.e. African elephants have about 10-15% more facial nucleus neurons than Asian elephants. The referee also notes that data on overall elephant brain size might be important for interpreting our data. We agree with this sentiment and we are preparing a ms on African and Asian elephant brain size. We find – unexpectedly given the larger body size of African elephants – that African elephants have smaller brains than Asian elephants. The finding might imply that African elephants, which have more facial nucleus neurons and more trigeminal nucleus trunk module neurons, are neurally more specialized in trunk control than Asian elephants.

      Change: We are preparing a further ms on African and Asian elephant brain size, a first version of this work has been submitted.

      Reviewer #4 (Public Review): 

      Summary: 

      The authors report a novel isomorphism in which the folds of the elephant trunk are recognizably mapped onto the principal sensory trigeminal nucleus in the brainstem. Further, they identifiy the enlarged nucleus as being situated in this species in an unusual ventral midline position. 

      Comment: The referee summarizes our work.

      Change: None.

      Strengths: 

      The identity of the purported trigeminal nucleus and the isomorphic mapping with the trunk folds is supported by multiple lines of evidence: enhanced staining for cytochrome oxidase, an enzyme associated with high metabolic activity; dense vascularization, consistent with high metabolic activity; prominent myelinated bundles that partition the nucleus in a 1:1 mapping of the cutaneous folds in the trunk periphery; near absence of labeling for the anti-peripherin antibody, specific for climbing fibers, which can be seen as expected in the inferior olive; and a high density of glia.

      Comment: The referee again reviews some of our key findings.

      Change: None. 

      Weaknesses: 

      Despite the supporting evidence listed above, the identification of the gross anatomical bumps, conspicuous in the ventral midline, is problematic. This would be the standard location of the inferior olive, with the principal trigeminal nucleus occupying a more dorsal position. This presents an apparent contradiction which at a minimum needs further discussion. Major species-specific specializations and positional shifts are well-documented for cortical areas, but nuclear layouts in the brainstem have been considered as less malleable. 

      Comment: The referee notes that our discrepancy with referee 2, needs to be addressed with further evidence and discussion, given the unusual position of both inferior olive and trigeminal nucleus in the partitioning scheme and that the mammalian brainstem tends to be positionally conservative. We agree with the referee. We note that – based on the immense size of the elephant trigeminal ganglion (50 g), half the size of a monkey brain – it was expected that the elephant trigeminal nucleus ought to be exceptionally large.

      Change: We did additional experimental work to resolve this matter: (i) We ascertained that elephant climbing fibers are strongly peripherin-positive. (ii) Based on elephant climbing fiber peripherin-reactivity we delineated the elephant olivo-cerebellar tract. We find that the olivo-cerebellar connects to the structure we refer to as inferior olive to the cerebellum. (iii) We also found that the trigeminal nucleus (the structure the referee refers to as inferior olive) appears to receive no climbing fibers. (iv) We provide indications that the tracing of the trigeminal nerve into the olivo-cerebellar tract by Maseko et al. 2023 was erroneous (Referee-Figure 1). These novel findings support our ideas.

      Reviewer #5 (Public Review): 

      After reading the manuscript and the concerns raised by reviewer 2 I see both sides of the argument - the relative location of trigeminal nucleus versus the inferior olive is quite different in elephants (and different from previous studies in elephants), but when there is a large disproportionate magnification of a behaviorally relevant body part at most levels of the nervous system (certainly in the cortex and thalamus), you can get major shifting in location of different structures. In the case of the elephant, it looks like there may be a lot of shifting. Something that is compelling is that the number of modules separated but the myelin bands correspond to the number of trunk folds which is different in the different elephants. This sort of modular division based on body parts is a general principle of mammalian brain organization (demonstrated beautifully for the cuneate and gracile nucleus in primates, VP in most of species, S1 in a variety of mammals such as the star nosed mole and duck-billed platypus). I don't think these relative changes in the brainstem would require major genetic programming - although some surely exists. Rodents and elephants have been independently evolving for over 60 million years so there is a substantial amount of time for changes in each l lineage to occur.

      I agree that the authors have identified the trigeminal nucleus correctly, although comparisons with more out groups would be needed to confirm this (although I'm not suggesting that the authors do this). I also think the new figure (which shows previous divisions of the brainstem versus their own) allows the reader to consider these issues for themselves. When reviewing this paper, I actually took the time to go through atlases of other species and even look at some of my own data from highly derived species. Establishing homology across groups based only on relative location is tough especially when there appears to be large shifts in relative location of structures. My thoughts are that the authors did an extraordinary amount of work on obtaining, processing and analyzing this extremely valuable tissue. They document their work with images of the tissue and their arguments for their divisions are solid. I feel that they have earned the right to speculate - with qualifications - which they provide. 

      Comment: The referee summarizes our work and appears to be convinced by the line of our arguments. We are most grateful for this assessment. We add, again, that the skeptical assessment of referee 2 will be published as well and will give the interested reader the possibility to view another perspective on our work.

      Change: None. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors):

      With this manuscript being virtually identical to the previous version, it is possible that some of the definitive conclusions about having identified the elephant trigeminal nucleus and trunk representation should be moderated in a more nuanced manner, especially given the careful and experienced perspective from reviewers with first hand knowledge elephant neuroanatomy.

      Comment: We agree that both our first and second revisions were very much centered on the debate of the correct identification of the trigeminal nucleus and that our ms did not evolve as much in other regards. This being said we agree with Referee 2 that we needed to have this debate. We also think we advanced important novel data in this context (the delineation of elephant olivo-cerebellar tract through the peripherin-antibody).

      Changes: Our revised Figure 2. 

      The peripherin staining adds another level of argument to the authors having identified the trigeminal brainstem instead of the inferior olive, if differential expression of peripherin is strong enough to distinguish one structure from the other.

      Comment: We think we showed too little peripherin-antibody staining in our previous revision. We have now addressed this problem.

      Changes: Our revised Figure 2, i.e. the delineation of elephant olivo-cerebellar tract through the peripherin-antibody).

      There are some minor corrections to be made with the addition of Fig. 2., including renumbering the figures in the manuscript (e.g., 406, 521). 

      I continue to appreciate this novel investigation of the elephant brainstem and find it an interesting and thorough study, with the use of classical and modern neuroanatomical methods.

      Comment: We are thankful for this positive assessment.

      Reviewer #2 (Recommendations For The Authors):

      I do realise the authors are very unhappy with me and the reviews I have submitted. I do apologise if feelings have been hurt, and I do understand the authors put in a lot of hard work and thought to develop what they have; however, it is unfortunate that the work and thoughts are not correct. Science is about the search for the truth and sometimes we get it wrong. This is part of the scientific process and why most journals adhere to strict review processes of scientific manuscripts. As I said previously, the authors can use their data to write a paper describing and quantifying Golgi staining of neurons in the principal olivary nucleus of the elephant that should be published in a specialised journal and contextualised in terms of the motor control of the trunk and the large cerebellum of the elephant. 

      Comment: We appreciate the referee’s kind words. Also, no hard feelings from our side, this is just a scientific debate. In our experience, neuroanatomical debates are resolved by evidence and we note that we provide evidence strengthening our identification of the trigeminal nucleus and inferior olive. As far as we can tell from this effort and the substantial evidence accumulated, the referee is wrong.

      Reviewer #4 (Recommendations For The Authors):

      As a new reviewer, I have benefited from reading the previous reviews and Author response, even while having several new comments to add. 

      (1) The identification of the inferior olive and trigeminal nuclei is obviously center stage. An enlargement of the trigeminal nuclei is not necessarily problematic, given the published reports on the dramatic enlargement of the trigeminal nerve (Purkart et al., 2022). At issue is the conspicuous relocation of the trigeminal nuclei that is being promoted by Reveyaz et al. Conspicuous rearrangements are not uncommon; for example, primary sensory cortical fields in different species (fig. 1 in H.H.A. Oelschlager for dolphins; S. De Vreese et al. (2023) for cetaceans, L. Krubitzer on various species, in the context of evolution). The difficult point here concerns what looks like a rather conspicuous gross anatomical rearrangement, in BRAINSTEM - the assumption being that the brainstem bauplan is going to be specifically conservative and refractory to gross anatomical rearrangement. 

      Comment: We agree with the referee that the brainstem rearrangements are unexpected. We also think that the correct identification of nuclei needs to be at the center of our revision efforts.

      Change: Our revision provided further evidence (delineation of the olivo-cerebellar tract, characterization of the trigeminal nerve entry) about the identity of the nuclei we studied.

      Why would a major nucleus shift to such a different location? and how? Can ex vivo DTI provide further support of the correct identification? Is there other "disruption" in the brainstem? What occupies the traditional position of the trigeminal nuclei? An atlas-equivalent coronal view of the entire brainstem would be informative. The Authors have assembled multiple criteria to support their argument that the ventral "bumps" are in fact a translocated trigeminal principal nucleus: enhanced CO staining, enhanced vascularization, enhanced myelination (via Golgi stains and tomography), very scant labeling for a climbing fiber specific antibody ( anti-peripherin), vs. dense staining of this in the alternative structure that they identify as IO; and a high density of glia. Admittedly, this should be sufficient, but the proposed translocation (in the BRAINSTEM) is sufficiently startling that this is arguably NOT sufficient. <br /> The terminology of "putative" is helpful, but a more cogent presentation of the results and more careful discussion might succeed in winning over at least some of a skeptical readership. 

      Comment: We do not know, what led to the elephant brainstem rearrangements we propose. If the trigeminal nuclei had expanded isometrically in elephants from the ancestral pattern, one would have expected a brain with big lateral bumps, not the elephant brain with its big ventromedial bumps. We note, however, that very likely the expansion of the elephant trigeminal nuclei did not occur isometrically. Instead, the neural representation of the elephant nose expanded dramatically and in rodents the nose is represented ventromedially in the brainstem face representation. Thus, we propose a ‘ventromedial outgrowth model’ according to which the elephant ventromedial trigeminal bumps result from a ventromedially direct outgrowth of the ancestral ventromedial nose representation.

      We advanced substantially more evidence to support our partitioning scheme, including the delineation of the olivo-cerebellar tract based on peripherin-reactivity. We also identified problems in previous partitioning schemes, such as the claim that the trigeminal nerve continues into the ~4x smaller olivocerebellar tract (Referee-Figure 1C, D); we think such a flow of fibers, (which is also at odds with peripherin-antibody-reactivity and the appearance of nerve and olivocerebellar tract), is highly unlikely if not physically impossible. With all that we do not think that we overstate our case in our cautiously presented ms.

      Change: We added evidence on the identification of elephant trigeminal nuclei and inferior olive.

      (2) Role of myelin. While the photos of myelin are convincing, it would be nice to have further documentation. Gallyas? Would antibodies to MBP work? What is the myelin distribution in the "standard" trigeminal nuclei (human? macaque or chimpanzee?). What are alternative sources of the bundles? Regardless, I think it would be beneficial to de-emphasize this point about the role of myelin in demarcating compartments. <br /> I would in fact suggest an alternative (more neutral) title that might highlight instead the isomorphic feature; for example, "An isomorphic representation of Trunk folds in the Elephant Trigeminal Nucleus." The present title stresses myelin, but figure 1 already focuses on CO. Additionally, the folds are actually mentioned almost in passing until later in the manuscript. I recommend a short section on these at the beginning of the Results to serve as a useful framework.

      Here I'm inclined to agree with the Reviewer, that the Authors' contention that the myelin stipes serve PRIMARILY to separate trunk-fold domains is not particularly compelling and arguably a distraction. The point can be made, but perhaps with less emphasis. After all, the fact that myelin has multiple roles is well-established, even if frequently overlooked. In addition, the Authors might make better use of an extensive relevant literature related to myelin as a compartmental marker; for example, results and discussion in D. Haenelt....N. Weiskopf (eLife, 2023), among others. Another example is the heavily myelinated stria of Gennari in primate visual cortex, consisting of intrinsic pyramidal cell axons, but where the role of the myelination has still not been elucidated. 

      Comment: (1) Documentation of myelin. We note that we show further identification of myelinated fibers by the fluorescent dye fluomyelin in Figure 4B. We also performed additional myelin stains as the gold-myelin stain after the protocol of Schmued (Referee-Figure 2). In the end, nothing worked quite as well to visualize myelin-stripes as the bright-field images shown in Figure 4A and it is only the images that allowed us to match myelin-stripes to trunk folds. Hence, we focus our presentation on these images.

      (2) Title: We get why the referee envisions an alternative title. This being said, we would like to stick with our current title, because we feel it highlights the major novelty we discovered.

      (3) We agree with many of the other comments of the referee on myelin phenomenology. We missed the Haenelt reference pointed out by the referee and think it is highly relevant to our paper

      Change: 1. Referee Figure. 2. Inclusion of the Haenelt-reference.

      Author response image 2.

      Myelin stripes of the elephant trunk module visualized by Gold-chloride staining according to Schmued

      A, Low magnification micrograph of the trunk module of African elephant Indra stained with AuCl according to Schmued. The putative finger is to the left, proximal is to the right. Myelin stripes can easily be recognized. The white box indicates the area shown in B.

      B, high magnification micrograph of two myelin stripes. Individual gold-stained (black) axons organized in myelin stripes can be recognized.

      Schmued, L. C. (1990). A rapid, sensitive histochemical stain for myelin in frozen brain sections. Journal of Histochemistry & Cytochemistry38(5), 717-720.

      Are the "bumps" in any way "analogous" to the "brain warts" seen in entorhinal areas of some human brains (G. W. van Hoesen and A. Solodkin (1993)? 

      Comment: We think this is a similar phenomenon.

      Change: We included the Hoesen and A. Solodkin (1993) reference in our discussion.

      At least slightly more background (ie, a separate section or, if necessary, supplement) would be helpful, going into more detail on the several subdivisions of the ION and if these undergo major alterations in the elephant.

      Comment: The strength of the paper is the detailed delineation of the trunk module, based on myelin stripes and isomorphism. We don’t think we have strong evidence on ION subdivisions, because it appears the trigeminal tract cannot be easily traced in elephants. Accordingly, we find it difficult to add information here.

      Change: None.

      Is there evidence from the literature of other conspicuous gross anatomical translocations, in any species, especially in subcortical regions? 

      Comment: The best example that comes to mind is the star-nosed mole brainstem. There is a beautiful paper comparing the star-nosed mole brainstem to the normal mole brainstem (Catania et al 2011). The principal trigeminal nucleus in the star-nosed mole is far more rostral and also more medial than in the mole; still, such rearrangements are minor compared to what we propose in elephants.

      Catania, Kenneth C., Duncan B. Leitch, and Danielle Gauthier. "A star in the brainstem reveals the first step of cortical magnification." PloS one 6.7 (2011): e22406.

      Change: None.

      (3) A major point concerns the isomorphism between the putative trigeminal nuclei and the trunk specialization. I think this can be much better presented, at least with more discussion and other examples. The Authors mention about the rodent "barrels," but it seemed strange to me that they do not refer to their own results in pig (C. Ritter et al., 2023) nor the work from Ken Catania, 2002 (star-nosed mole; "fingerprints in the brain") or other that might be appropriate. I concur with the Reviewer that there should be more comparative data. 

      Comment: We agree.

      Change: We added a discussion of other isomorphisms including the the star-nosed mole to our paper.

      (4) Textual organization could be improved. 

      The Abstract all-important Introduction is a longish, semi "run-on" paragraph. At a minimum this should be broken up. The last paragraph of the Introduction puts forth five issues, but these are only loosely followed in the Results section. I think clarity and good organization is of the upmost importance in this manuscript. I recommend that the Authors begin the Results with a section on the trunk folds (currently figure 5, and discussion), continue with the several points related to the identification of the trigeminal nuclei, and continue with a parallel description of ION with more parallel data on the putative trigeminal and IO structures (currently referee Table 1, but incorporate into the text and add higher magnification of nucleus-specific cell types in the IO and trigeminal nuclei). Relevant comparative data should be included in the Discussion.

      Comment: 1. We agree with the referee that our abstract needed to be revised. 2. We also think that our ms was heavily altered by the insertion of the new Figure 2, which complemented Figure 1 from our first submission and is concerned with the identification of the inferior olive. From a standpoint of textual flow such changes were not ideal, but the revisions massively added to the certainty with which we identify the trigeminal nuclei. Thus, although we are not as content as we were with the flow, we think the ms advanced in the revision process and we would like to keep the Figure sequence as is. 3. We already noted above that we included additional comparative evidence.

      Change: 1. We revised our abstract. 2. We added comparative evidence.

      Reviewer #5 (Recommendations For The Authors): 

      The data is invaluable and provides insights into some of the largest mammals on the planet. 

      Comment: We are incredibly thankful for this positive assessment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review: 

      This study used ATAC-Seq to characterize chromatin accessibility during stages of GABAergic neuron development in induced pluripotent stem cells (iPSCs) derived from both Dravet Syndrome (DS) patients and healthy donors. The authors report accelerated GABAergic maturation to a point, followed by further differentiation into a perturbed chromatin profile, in the cells from patients. In a preliminary analysis, valproic acid, an anti-seizure medication commonly used in patients with DS, increased open chromatin in both patient and control iPSCs in a nonspecific manner, and to different degrees in cultures derived from different patients. These findings provide new information about DS-associated changes in chromatin, and provide further evidence for developmental abnormalities in interneurons with DS. 

      Strengths:

      This is a novel study that aims to investigate the epigenetic changes that occur in a sodium channel model of epilepsy; these changes are often ignored but may be an interesting area for future therapeutics. In general, the flow of the paper is good, and the figures are well-designed.  Reply: Thank you for your positive feedback about our work. 

      Weaknesses:

      The most substantial weakness relates to the observation that DS is often viewed as a monogenic form of epilepsy. It is directly linked to SCN1A gene haploinsufficiency (Yu et al, 2006; Ogiwara et al, 2007). The gene product is Nav1.1, the alpha subunit of voltage-gated sodium channel type I that regulates neuronal excitability. Yet, analysis was conducted at time points of GABAergic interneuron differentiation in which SCN1A is likely not expressed. The paper would be strengthened if SCN1A expression and Nav1.1 protein were examined across the experimental time course. If SCN1A is not yet expressed, this would complicate any explanation of how the observed epigenetic changes might arise. It also seems counterintuitive that the absence of a sodium channel can accelerate differentiation, when, a priori, one might expect the opposite (a 'less neuronal' signal). 

      Thanks, this is an important point!  In our revised manuscript, we have incorporated data on the expression of SCN1A at d19 and d65 of GABAergic development in both the control and patient groups. We first retrieved data from our previous RNA-Seq analysis, showing SCN1A gene expression in our cells at both d19 and d65. We have now updated our text on the SCN1A gene expression in the revised manuscript (Revised Supplementary Figure 1A, revised text Line 108-109). Second, we confirmed the dynamics of SCN1A expression by real-time quantitative RT/PCR analysis at four time-pionts of GABAergic development (d0, d19, d35 and d65). Notably, expression of SCN1A was detected by qRT-PCR from d19 and the expression increased with differentiation. We have now included this information in the revised manuscript (Revised Supplementary Figure 1B, revised text Line 112). 

      Related to this, another important limitation of the study is that the controls are cells derived from healthy individuals and not from isogenic lines. The usage of isogenic lines is extremely relevant for every study in which iPSC-derived somatic cells are used to model a disease, but specifically in diseases like DS, in which the genetic background has an ascertained impact on disease phenotype (Cetica et al, 2017 and others). This serious limitation should be considered.

      Yes, we fully agree that isogenic and edited patient-derived iPSC would have been the ideal controls. At an early stage we therefore invested considerable time and efforts in order to generate isogenic lines from patientderived iPSC. However, editing of the SCN1A variants in patient-derived iPSC turned out unsuccessful after several trials and modifications so we finally turned to iPSC from healthy donors. This is now discussed together with other limitations of our study in the revised manuscript (end of discussion section, lines 499-506).

      In addition, the authors should provide data on variability across cell lines and differentiations to help convince the reader that the results can be attributed to genetic defects, rather than variability across individuals. 

      This is a valuable point. In the revised manuscript, we have now added plots and IF staining from individual samples to give the readers a complete picture on how they are distributed (Revised Supplementary Figure 1C, Revised Supplementary Figure 2, and Revised Supplementary Figure 4).

      In the revised manuscript, we incorporated an explanation on the strategy used to compare the two groups (cases vs. controls) in more detail. In our analysis, we first compared the dynamic changes of chromatin accessibility cell line by cell line across differentiation. We then extracted the common changes from different cell lines at each time point (Revised text line 152-155, line 226-228). Using this strategy, we extracted the common changes confined to the control and patient groups, respectively. With this approach we avoid to capture the variability across individuals.

      Additionally, the authors acknowledge the variability of the differentiations and cell lines, which is commendable, and they attribute this to "possibly reflecting cell line specific and endogenous differences reported previously", but could also have to do with cell death. This is a large confounding factor for ATAC-seq. Certainly, Sup Fig 1C shows lower FrIP scores, consistent with cell death, and there seems to be a lot of death in the representative images. Moreover, the iGABA neurons are very difficult to keep alive, especially to 65 days, without co-culturing with glia and/or glutamatergic neurons. The authors should comment on how much these factors may have influenced their results. 

      With this point in mind, we re-examined QC of our ATAC-Seq across all samples: As shown in revised

      Supplementary Figure 2C and Supplementary Figure 4C, our cutoff for FRiP is 15%, and all of samples have an FrIP of more than 15%. At the later time points (d35 or d65), we did not observe a FRiP <15%. We therefore feel confident that the quality of ATAC-Seq is good enough for downstream analysis and data interpretation.  

      Regarding the differentiation protocol, we are following a directed protocol of iPSC towards interneurons. The protocol is described in detail by Maroof et al (reference 34) and slightly modified in our lab (described in reference 13). With our modified protocol, GABAergic cells are viable beyond day 65 without the need of co-cultures with astrocyte or microglia. This is also reflected by the electrophysiological activity of interneurons at d65 and at later time points (reference 13). Additionally, our ambition was to obtain a homogeneous cell population for further analysis. Adding other cell types to the cultures would have interfered with downstream processes and a need for cell sorting. Using our protocol, we obtain viable GABA interneurons after up to 100 days in culture. To assess the viability of our cells at the point of sampling (other than by morphological assessment), we used Trypan blue staining and an automated cell counter. Only samples with a viability >90% were processed for ATAC seq. which is a commonly used cut-off for cell viability. We have now modified the method section in the revised version to describe the GABAergic differentiation and sampling (line 519-529).

      Finally, changes in gene expression are only inferred, as no RNA levels were measured. If RNA-seq was not possible it would have been good to see at least some of the key genes/findings corroborated with RNA/protein levels vs chromatin accessibility alone, particularly given that these molecular readouts do not always correlate. 

      In our revised manuscript, we include our recently published RNA-seq performed at d19 and d65. We also correlated the RNAseq and ATACseq data obtained from the same samples.  The Pearson correlations between gene expression and chromatin accessibility were within the range 0.49-0.57 (Revised Supplementary Figure 2G, Revised supplementary Figure 4G), which is acceptable according to standard criteria. The results confirmed that the quality of ATAC-Seq is good enough for analysis of expression levels and chromatin openness in key genes. We also added gene expression levels from RNA-seq (d19 and d65) in our revised manuscript (Revised Figure 1G, Revised Figure 2G). Finally, we performed qRT-PCR analysis of key genes in each cluster and the results are now included in the revised version (Revised Supplementary Figure 3E, Revised Supplementary Figure 5E)

      Additional Points:

      (1) Representative images for cell-identity markers for only D65 are shown, and not D0, D19, and D35 though it is stated in the text that this was performed. At a minimum, these representative images should be shown for all lines. 

      As suggested, we have now added images for cell identity markers of all iPSC lines in the revised version (Revised Supplementary Figure 1C).

      (2) What QC was performed on iPSC lines, i.e. karyotype/CNV analysis and confirmation of genotypes?

      All iPSC lines used in this study have been fully characterized according to standard and state-of-the art procedures: Expression of pluripotency and stemness genes has been shown by immunostaining, flow cytometry and scorecard analysis; integrity of the genome has been assessed by karyotyping using g-banding; differentiation capacity was characterized using an embryoid body assay in combination with scorecard analysis; and genotypes were verified by Sanger sequencing. Please, see the following publications for full datasets: Schuster et all, Neurobiol Dis 2019, Schuster et al Stem Cell Res 2019, Sobol et al Stem Cells and Development 2015. In our lab, the integrity of iPSC lines are routinely verified using flow cytometry (expression for TRA-1-60 and SSEA4), immunostaining (expression of NANOG, SOX2 and OCT4), Sanger sequencing (targeting variants in SCN1A gene), cell morphology analysis and analysis of mycoplasma by MycoAlert® (Lonza).

      (3) Were all experiments performed on a single differentiation? Or multiples? Were the differentiations performed with the same type? If not, was batch considered in the analysis? 

      Thank you for raising this question. The text Material and Methods has been modified as follows, to better describe the differentiation and sampling procedure:

      “GABAergic interneuron differentiation from iPSCs was performed as previously described (reference 13). The protocol utilizes DUAL SMAD inhibition to induce neurogenesis towards neural stem cells for 10 days, followed by patterning with high levels of sonic hedgehog for nine days towards cortically fated neuronal progenitor cells (NPC) and subsequent maturation for 46 days, i.e. a total of 65 days (Figure 1A). Neuronal cells at day 65 and onwards are healthy and viable as judged by morphological assessment by light microscopy. Differentiation was performed at least 3 times per cell line.  

      Cell cultures were sampled at days 0 (D0), D19, D35 and D65, respectively, by harvesting cells with TryplE and centrifugation (300 x g, 3 min). Harvested cells were counted and assessed for viability using trypan blue staining and an automated EVE cell counter (Nano Entek). Samples with a viability of >90% were chosen for ATAC-Seq library preparation (see below).”.  

      I also assume that technical replicates were merged, and then all three biological replicates were kept for each analysis and outliers were not removed, e.g. Control_D19_8F seems like an example of an outlier. 

      This is a valuable point. We agree on that there is variability across three health donors and patients, respevtively, but the quality of ATAC-Seq is good after multiple assessment of QC (Revised Supplementary Figure 2B-D). The color code in Supplementary Figure 1C may be mis-leading as the Pearsson correlation of all samples was displayed. Overall, the correlation from all ATAC-seq among replicates are over 0.8. At the same time, we observed that samples at d0 are clustered together, but not at the later time points. We interpret this as related to the cell-line specific plasticity of chromatin dynamic during differentiation. The observation agrees with our results from PCA (Revised Supplementary Figure 2F).  

      (4) In Figure 1C, it is intriguing that the ATACseq signal gets stronger in imN. One might expect it to be strongest in the iPSCs which are undifferentiated and have the highest levels of open chromatin. Is this a function of sequencing depth, or are all the Y-axes normalized across all time points? 

      This is another valuable point. Figure 1C present the average chromatin openness for clusters specific regions- not of chromatin openness from the entire genome, which is a reason for why the chromatin openness at

      D35 is higher than at other time-points. The genome-wide chromatin openness is presented in revised

      Supplementary Figure 2D and we have now updated the figure legend to avoid any potential misunderstanding. 

      The sequencing depth for each sample is extracted in a similar range. To give the readers a complete picture, we also present the depth of sequencing reads for each sample (Revised Supplementary Figure 2A and Revised Supplementary Figure 4A). The Y-axes of genome browser tracks were normalized, and we added the normalized value in the figures. 

      (5) In Figure 1F, are these all enriched terms, or were they prioritized somehow? 

      Yes, the enriched terms are prioritized based on biological meanings, and we have now clarified this in the updated legend of the manuscript. In addition, all enriched terms are now included in revised Supplementary Table 2 and Supplementary Table 4. 

      (6) In Figure 1G (also the same plots in Fig 2/3), are all these images normalized i.e. there is no scale bar for each track, and do they represent and aggregate BAM/bigwig?

      Yes, the genome browser tracks were normalized and we have now revised the figures by adding scale bars.

      It would be good to show in supplement the variability across cell lines/diffs - particularly given the variability in the heatmap/PCA - and demonstrate the rigor/reproducibility of these results. This comment applies to all these plots across the 3 figures, particularly as in some instances the samples appear to cluster by individual first and then time point (Sup Fig 3B). 

      Thanks. We have now revised the figure with plots showing individual samples. 

      How confident are the authors that these effects are driven by genotype and not a single cell line? In the Fig 3D representation of NANOG, it is very difficult to see any difference between patient and control. 

      In Figure 3D, we showed common chromatin dynamics in the control and patient groups. To avoid any misunderstanding, we have now updated our legend in the revised manuscript. 

      (7) For the changes in occupancy annotation (UTR/exon/intron etc), are these differences still significant after correcting for variability from cell line to cell line at each time point? I.e. rather than average across all three samples, what is the range?  Reply: Revised accordingly. 

      (8) The VPA timepoint is not well-justified. Given that VPA would be administered in patients with fully mature inhibitory neurons, it is difficult to determine the biological relevance. I appreciate that this is a limitation of the model, but this should at least be addressed in the manuscript. 

      We agree on that our model system of GABAergic interneuron development has limitations and that cells may not fully recapitulate the development and physiology in vivo. Obvious factors to consider in our system are the directed protocol to enrich for GABAergic interneurons and the differentiation time-line restricted to 65d. This is now discussed (lines 499-506).

      Recommendations for the authors:

      (1) The term 'mutation' has been replaced with the term ' pathogenic variant' or likely pathogenic variant depending on the context, please see PMID: 25741868 

      Thank you for pointing this out. We have replaced all instances of “mutation” with “pathogenic variant” throughout the manuscript.

      (2) It is unclear what the nomenclature for sample labelling is in Supplementary Figure 1, e.g. 7C, 8F, 1B.  

      We apologize for this confusion. There are cell lines names. We labeled all data and images according to cell line name, i.e. control lines: Ctl1B, Ctl7C and Ctl8F; patient lines: DD1C, DD4A, DD5A. To avoid any potential confusion, we have added a note in the revised legend of Supplementary Figure 1B.

      (3) Can the authors confirm that the Deseq2 FDR values are Benjamini-Hochberg procedure corrected per default settings? If so, this should ideally be added to methods or legend for clarity 

      Yes, default settings were used in Deseq2 FDR values, which is added in the method part of revised manuscript. 

      (4) While it makes sense that the authors present the data in the order of Figure 1, and Figure 2, this actually makes it quite difficult to compare the two datasets, especially for the functional enrichment in the "F" figures. It may be helpful to consider re-organizing the figure order. For instance, for the long-term potentiation signal in the DS-iPSCs, what does this mean in terms of biological relevance? Or maybe Figure 2 needs to be supplementary given that Figure 3 is a more direct comparison.  

      Thank you for the suggestions. We attempted to reorganize during our revision. We still believe it is easier for the audience to grasp the main message if we organize it according to our current workflow—first presenting an individual differential landscape for controls and patients, and then comparing the common and unique aspects among them.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, entitled " Merging Mul-OMICs with Proteome Integral Solubility Alteration Unveils Antibiotic Mode of Acon", Dr. Maity and colleagues aim to elucidate the mechanisms of action of antibiotics through combined approaches of omics and the PISA tool to discover new targets of five drugs developed against Helicobacter pylori.

      Strengths:

      Using transcriptomics, proteomic analysis, protein stability (PISA), and integrative analysis, Dr. Maity and colleagues have identified pathways targeted by five compounds initially discovered as inhibitors against H. pylori flavodoxin. This study underscores the necessity of a global approach to comprehensively understanding the mechanisms of drug action. The experiments conducted in this paper are well-designed and the obtained results support the authors' conclusions.

      Weaknesses:

      This manuscript describes several interesting findings. A few points listed below require further clarification:

      (1) Compounds IVk exhibits markedly different behavior compared to the other compounds. The authors are encouraged to discuss these findings in the context of existing literature or chemical principles.

      This is a good point. We have added the following paragraph (Page No-13).

      “In several of our studies, compound IVk, which has a higher MIC, exhibits markedly different behavior. This difference in behavior may stem from different sources, including intercellular availability, inactivation inside the cell, or loss of target specificity. Multiple studies have previously demonstrated that there is only a 30% chance for a structurally similar compound to have similar biological activity32.”

      (2) The incubation me for treating H. pylori with the drugs was set at 4 hours for transcriptomic and proteomic analyses, compared to 20 min for PISA analysis. The authors need to explain the reason for these differences in treatment duration.

      This is now explained in Pages 17 and 19, where the following paragraphs have been included

      “The incubation time for transcriptomics and proteomics assays was determined based on the Time-Kill Curves assay (Fig. 6(A)). The 4-hour time point shows a significant amount of cell death compared to the control population.”

      “The target deconvolution method aims to evaluate the initial interaction with intracellular proteins. We selected a 20-minute time point based on intracellular ROS generation (not shown). It is a well-reported phenomenon that bactericidal drugs induce early production of ROS.”

      (3) The PISA method facilitates the identification of proteins stabilized by drug treatment. DnaJ and Trigger factor (g), well-known molecular chaperones, prevent protein aggregation under stress. Their enrichment in the soluble fraction is expected and does not necessarily indicate direct stabilization by the drugs. The possibility that their stabilization results from binding to other proteins destabilized by the drugs should be considered. To prevent any misunderstanding, the authors should clarify that their methodology does not solely identify direct targets. Instead, the combination of their findings sheds light on various pathways affected by the treatment.

      This is also a very valuable observation. We now clearly state that in new paragraphs at Pages 8 and 13

      Another target shared among several compounds is the chaperone protein trigger factor (Tig), which plays a crucial role in facilitating proper protein folding and is indispensable for the survival of bacterial cells. The solubility of this protein has been altered by all the compounds except IVk (Fig. 2(I-J)) in a concentration-dependent manner (Fig. S4(B, D, and E)). The possibility of Tig interacting with other proteins destabilized by the drug, along with the influence of the heat gradient during the PISA assay, may introduce potential noise in the data. Further investigation is required to confirm the interaction of the drug with Tig.

      “The module “black” associated with this compound contains Tig, which is involved in facilitating proper protein folding, as a target, and it down-regulates multiple proteins associated closely with S12 ribosomal protein of the 30S subunit (Fig. S9(D)) indicating its involvement in stabilization of ribosomal protein.”

      (4) At the end of the manuscript, the authors conclude that four compounds "strongly interact with CagA". However, detailed molecule/protein interaction studies are necessary to definitively support this claim. The authors should exercise caution in their statement. As the authors mentioned, additional research (not mandated in the scope of this current paper) is necessary to determine the drug's binding affinity to the proposed targets.

      We have modified the sentence (Page -15) to say:

      “This study identifies four out of our five compounds that induce significant change in the solubility of CagA, the major virulence factor of H. pylori.”

      (5) The authors should clarify the PISA-Express approach over standard PISA. A detailed explanation of the differences between both methods in the main text is important.

      This was already explained in Page 5 (no changes have been made)

      Reviewer #2 (Public Review):

      Summary:

      This work has an important and ambitious goal: understanding the effects of drugs, in this case antimicrobial molecules, from a holistic perspective. This means that the effect of drugs on a group of genes and whole metabolic pathways is unveiled, rather than its immediate effect on a protein target only. To achieve this goal the authors successfully implement the PISA-Express method (Protein Integral Solubility Alteration), using combined transcriptomics, proteomics, and drug-induced changes in protein stability to retrieve a large number of genes and proteins affected by the used compounds. The compounds used in the study (compound IVa, IVb, IVj, and IVk) were all derived from the precursors compound IV, they are effective against Helicobacter pylori, and their mode of action on clusters of genes and proteins has been compared to the one of the known pylori drug metronidazole (MNZ). Due to this comparison, and confirmed by the diversity of responses induced by these very similar compounds, it can be understood that the approach used is reliable and very informative. Notably, although all compound IV derivatives were designed to target pylori Flavodoxin (Fld), only one showed a statically significant shift of Fld solubility (compound IVj, FIG S11). For most other compounds, instead, the involvement of other possible targets affecting diverse metabolic pathways was also observed, notably concerning a series of genes with other important functions: CagA (virulence factor), FtsY/FtsA (cell division), AtpD (ATP-synthase complex), the essential GTPase ObgE, Tig (protein export), as well as other proteins involved in ribosomal synthesis, chemotaxis/motility and DNA replication/repairs. Finally, for all tested molecules, in vivo functional data have been collected that parallel the omics predictions, comforting them and showing that compound IV derivatives differently affect cellular generation of reactive oxygen species (ROS), oxygen consumption rates (OCR), DNA damage, and ATP synthesis.

      Strengths:

      The approach used is very potent in retrieving the effects of chemically active molecules (in this case antimicrobial ones) on whole cells, evidencing protein and gene networks that are involved in cell sensitivity to the studied molecules. The choice of these compounds against H. pylori is perfect, showcasing how different the real biological response is, compared to the hypothetical one. In fact, although all molecules were retrieved based on their activity on Fld, the authors unambiguously show that large unexpected gene clusters may, and in fact are, affected by these compounds, and each of them in different manners.

      Impact:

      The present work is the first report relying on PISA-Express performed on living bacterial cells. Because of its findings, this work will certainly have a high impact on the way we design research to develop effective drugs, allowing us to understand the fine effects of a drug on gene clusters, drive molecule design towards specific metabolic pathways, and eventually better plan the combination of multiple active molecules for drug formulation. Beyond this, however, we expect this article to impact other related and unrelated fields of research as well. The same holistic approaches might also allow gaining deep, and sometimes unexpected, insight into the cellular targets involved in drug side effects, drug resistance, toxicity, and cellular adaptation, in fields beyond the medicinal one, such as cellular biology and environmental studies on pollutants.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please modify these few concerns:

      -  It is unclear from the introduction and discussion whether conventional transcriptomic and proteomic analyses have previously been conducted on the compounds examined in this study. If only targeted studies have been performed please clarify this further.

      To make it more clear, we have added the following paragraph in Page 5:

      “Our investigation into understanding the mode of action of nitro-benzoxadiazole compounds commenced with a comparison of the conventional transcriptional and translational changes induced by these compounds, the vehicle control (DMSO), and the commercially used drug MNZ. RNA sequencing (RNA-seq) and expressional proteomics were employed to identify transcriptional and translational changes, respectively.”

      -  The decision to monitor the oxygen consumption rate (OCR) is based on the hypothesis that the drugs would impact flavodoxins function. Could the authors cite specific studies that suggest a reduction in flavodoxin leads to decreased OCR that can be measured?

      The reviewer is correct to say that we have done this study based on our hypothesis that a reduction in flavodoxin may lead to decreased OCR.  To our knowledge, there is no previous studies indicating that so we now clearly state (Page 14) that it is our hypothesis.

      “On the other hand, given that these drugs indicated involvement of multiple factors from the electron transport chain including flavodoxin and we observed significant drop in the ATP production rate (Fig. 6(D)) associated to compounds IV and IVj, we have investigated the changes in oxygen consumption rate (OCR) as we hypothesize that a reduction in soluble flavodoxin could lead to decreased OCR”

      -  Increase font size in some figures and supplemental materials for clarity.

      We acknowledge the reviewer's comment and have addressed it to the best possible extent in the figures.

      -  Correct figure references throughout the text (example of mistake p4, Fig S1D, p6 S1C).

      We have corrected the figure references.

      -  Check spelling errors, for example, Figure S1B: "library preparation".

      We have revised the figures and corrected spelling errors.

      -  Ensure H. pylori is in italics.

      Done!

      -  Figure S4: Replace (D) by (E).

      Done! Thank you.

      -  Page 7: Check the sentence: "...RpleE, InfC) and F Furthermore, we..." .

      Corrected!  

      “The 20 common essential targets are mostly associated with cell division (for example, FtsZ), small subunit ribosomal proteins (RspC, RspE, RspL, RplE, InfC). Furthermore, we identified a few unique changes for compound IV (DnaN, involved in DNA tethering and processivity of DNA polymerases, and C694_06445, which could be a functional equivalent of delta subunit of DNA polymerase III).”

      -  Page 9: Please modify the name of one compound "Compounds IV, IVj (and not IVk) and MnZ downregulate...".

      We have observed that both reviewers mentioned this point and we revisited the data, as suggested by Fig S8(B), that compounds IV, IVk, and MNZ cluster together and downregulate the genes associated with this pathway. Based on this, we have not changed anything in the text.  

      -  Figure S9: please clarify symbols (triangles and others) in the Figure legend.

      Done!

      -  Page 9: Is it the Figure S9B you are referring to? Talking about proteomics?

      Sorry, we have not understood the above comment.

      Reviewer #2 (Recommendations For The Authors):

      All figures are printed as one per page. In this format, almost all pictures suffer a severe problem with dimensions. Notably graph axes and axis values, subtitles, and legends within the pictures are too small, although the graphical part is almost always appropriate. Negative example (higher fonts are needed): Figure 1. Positive example (font ok): Figure 2A or Figure 3 right panels.

      We have carefully revised our figures to address the issues you mentioned, ensuring that elements are visible when printed one per page. In Fig 1: We have increased the font sizes of the graph axes, axis values, subtitles, and legends to improve readability. Additionally, we have color-matched different Gene Ontology (GO) terms for better rideability. In Fig 2: To enhance clarity, we have resized the figure by removing the top 10 protein list, now presented in a separate table. This ensures that the figure's main content remains prominent.  These modifications have been made across figures to maintain consistency and readability.

      For all figures, particularly for non-experts, not only a list of what is found in the picture should be provided, but also a minimal, simplified key of interpretation (of what is to be noticed). Particularly relevant for scatter plots.

      We have modified the legends to provide simplified key interpretation for the scatter plots. 

      In general for most analyses I see the involvement of FtsA, whereas most discussions concern FtsY and FtsZ. Maybe this point should be clarified. For example: i) FtsZ is quoted in the Second "Results" paragraph (page 6), but we can't find this gene in Figure 2, nor in the corresponding table (Figure 2A); ii) FtsY downregulation is quoted in the Fifth "Results" paragraph (page 9), but we can't find this gene in Figure 5, 9S or 10S.

      We are not entirely sure if we have understood the reviewer's comment correctly, as we did not mention FtsY in our discussion section. In the discussion section, we have focused on the involvement of FtsZ and FtsA with some of our compounds. We decided to discuss them together because FtsZ is the primary component that is recruited to the membrane by the actin-related protein FtsA, while the role of FtsY remains highly debated.

      Figure 1: same colour for the same GO: term in different panels should be used.

      Done!

      Figure 4: please specify (being it essential throughout the whole paper) that the group colouring only refers to Figure 4A, lower bar.

      Done!

      Figure 5, S9, and S10: having the combination of analysed sets (brown / IV , magenta / IVb, etc....) as a panel subtle is almost a necessity, to avoid constant page turning. I did rewrite all of them by hand to be able to follow the main text story.

      Done!

      What are the triangles? (this is not written anywhere).

      We have now explained this in the legends of Fig5.

      Figures S9 and S10 are too crowded (please refer to Figure 5 for a good format/size).

      For supplementary figures S9 and S10 we prefer to keep the gene names, but in order to make them more legible we have now added subtitles to each panel.

      Second and third "Results" paragraph. Explicitly saying that the Second is only focused on TOP 10 hits, at the beginning of the paragraph (while the third on essential genes) would help enormously the non-specialist in orienting among the different sections.

      On page 7, we have revised the text to indicate that the paragraph is only focused on the top 10 hits. Additionally, we have included a table of top 10 hits for better clarity and accessibility. 

      Page 6: the following sentence should be in the introduction, to stress the novelty of the work: "This is the first me PISA assay, in the form of PISA-Express, has been successfully performed in living bacterial cells, with protocols adapted and modified from previous PISA studies in mammalian cells".

      Page- 2 

      We agree this is an important point. However, having we stated it in both the abstract and in the PISA section in the results we prefer not to state it once more in the Introduction.

      (no changes made)

      I couldn't find any reference to Figure S3 in the text.

      Included! (P 9)

      "Compounds IV, IVk, and MNZ downregulate the genes associated with this pathway (Fig. 4(B) & S8(B))": it seems to me that it is IVj rather than IVk to downregulate. Please check carefully.

      We have observed that both reviewers mentioned this point and we revisited the data, as suggested by Fig S8(B), that compounds IV, IVk, and MNZ cluster together and downregulate the genes associated with this pathway. Based on this, we have not changed anything in the text.  

      Page 12: of the pre-defined target like flavodoxin => of the pre-defined target flavodoxin.

      Thanks! We have removed “like” from the sentence.

      Metronidazol (=MNZ) only appears on page 13 (MNZ already on page 8).

      Corrected!  The correspondence is now first indicated in P. 3.

      Please resolve the ambiguity metronidazol/metronidazole (main text and figures).

      We now always say “metronidazole”

      The Sixth "Results" paragraph (pages 10-11) should be developed a bit more. All Figure 6 results are summarized in 8 lines at the end of the paragraph. This doesn't bring much, particularly to a non-specialist reader. Please, for each panel, clearly explain what is to be noticed and what main conclusion(s) can be extracted.

      We have improved the description of the section. The modified part now reads:

      …This indicates that the nitro-bearing groups have a higher propensity to generate ROS. We have also observed that the genes associated with the generation of ROS are significantly overexpressed for compounds IV, IVb, IVj, and MNZ (Fig. S12(A)). As described above and depicted in Fig. S12(B), multiple DNA damage repair proteins and genes are down-regulated in the presence of compounds IV, IVb, IVj, and MNZ. Additionally, DNA PolA was found to be a major target for compound IVj. Following these results, we investigated compound-induced DNA damage using the APO BrdU TUNEL assay. All the compounds, particularly IV and IVj, caused significant DNA damage (Fig. 6(C)).

      On the other hand, given that these drugs indicated involvement of multiple factors from the electron transport chain including flavodoxin and we observed significant drop in the ATP production rate (Fig. 6(D)) associated to compounds IV and IVj, we have investigated the changes in oxygen consumption rate (OCR) as we hypothesize that a reduction in soluble flavodoxin could lead to decreased OCR.  Though the signal-to-noise ratio of these data is poor…

      and we added figure S12 for clarity.   

      In the same section I found: "Compound IV and its derivatives cause a marked increase in ROS generation when compared to the control (DMSO)" => refers to THIS work or previous work? (in the later case, please quote it).

      This data is from our current paper, as shown in Fig 6(B).

      In the same paragraph, "the signal-to-noise ratio of these data is considerable" => does it mean that you have good (high signal-to-noise) data, or that you have too high noise for precise quantification? I rather understood the later, but this sentence definitely needs to be rewritten.

      Thank you for pointing out the mistake. Your interpretation is correct. We have corrected the sentence.

    1. Author response:

      The following is the authors’ response to the original reviews.

      (1) The conclusions in the text are very broad and general but often based on a limited number of examples. It would be important that the authors hit the appropriate tone when most of the analysis (in Figure 5) is derived from n=3 events.

      We have tried to hit the correct tone here by modifying our manuscript text. In particular we have we have added a pie chart to Figure 4 (Figure 4C, that summarises data from all RBMX targets, not just the original n=3, and shows that most RBMX targets are rescued by RBMXL2).

      (2) The fractions of long/ultra-long exons actually bound by/regulated by RBMX are not clearly stated - which is in contrast to the general statement of the title (implying a global role for RBMX in proper splicing of ultra-long exons).

      (i) We have changed our title (now “An anciently diverged family of RNA binding proteins maintain correct splicing of a class of ultra-long exons through cryptic splice site repression”).

      (ii) We also include much more clear text about the fractions of long/ultralong exons bound by RBMX with the following text: 

      “…..This led us to test whether RBMX protein is preferentially associated with long exons. For this we plotted the distribution of internal exons bound and regulated by RBMX together with all internal exons expressed from HEK293 mRNA genes (Liu et al., 2017) (Figure 2 – Source Data 1). We found that RBMX controls and binds two different classes of exons: the first have comparable length to the average HEK293 exon, while the second were extremely long, exceeding 1000 bp in length (Figure 2F). We defined this second class as ‘ultra-long exons’, which represented the 18.9% of internal exons regulated by RBMX and 17.6% of the ones that contained RBMX iCLIP tags. These proportions were significantly enriched compared to the general abundance of internal ultra-long exons expressed from HEK293 cells, which was only 0.4% (Figure 2G)……”

      “…….We next wondered whether ultra-long exons regulated by RBMX (which represented 11.6% of all ultra-long internal exons from genes expressed in HEK293) had any particular feature compared to ultra-long exons that were RBMX-independent……..”

      (3) The authors should state what fraction of ultra-long exons show cryptic splicing in the RBMX siRNA that are corrected by RBMXL2 overexpression (rather than just showing the 3 events). There's some confusion about the global nature of the conclusions relative to the data displayed.

      This is a good point. We have used the RNAseq information as suggested, and included a pie chart (Figure 4C) that includes this information.

      (4) It would be helpful if the authors could identify if there are some motifs more present in ultra-long exons than others.

      Good point, we have included k-mer analysis of the ultra-long exons bound by RBMX, and also more generally ultra-long exons in the human genome, in Figure 2H and 2I. We also add the following text:

      K-mer analyses also showed that while ultra-long exons within mRNAs are rich in AT-rich sequences compared to shorter exons (Figure 2H), the ultra-long exons that are either regulated or bound by RBMX displayed enrichment of AG-rich sequences (Figure 2I), consistent with our identified RBMX-recognised sequences (Figure 2C).

      (5) The authors should evaluate if RBMX-repressed 3' splice sites have similar or low splice site scores/strengths than natural 3' splice sites.

      We have added splice site score analyses in Figure 1F and Figure 1 Supplement 1B. These show that the cryptic splice sites repressed by RBMX are not significantly different from those that are normally used. We add the following text to accompany these figure panels:

      “Furthermore, analysis of splice site strength revealed that, unlike splice sites activated by RBMX (Figure 1 – Figure supplement 1B), alternative splice sites repressed by RBMX have comparable strength to more commonly used splice sites (Figure 1F). This means that RBMX operates as a splicing repressor in human somatic cells to prevent use of ‘decoy’ splice sites that could disrupt normal patterns of gene expression.”

      (6) The section "RBMX protein-RNA interactions may insulate important splicing signals from the spliceosome." is a very preliminary look at possible mechanisms. Can you integrate the RNA Seq and CLIP datasets to generate "splicing maps" that would provide more generalized insights? In fact, where possible, it would be great to integrate the iCLIP data from the same cell types to generate RNA splicing maps (with the KD RNA-seq data)

      We have added “RNA map-type” plots to integrate iCLIP data with splicing patterns (Figure 2 Figure supplement 1D and 1E), and made corresponding changes to the text.

      Additional changes

      We also made some extra changes to respond to the further points raised by reviewers.

      (1) We have carried out gene ontology analysis of those genes that contain RBMX-regulated ultra-long exons versus all ultra-long exons (now Figure 3A, and also Figure 3- Figure supplement 1A and 1B).

      (2) We have corrected the cartoon summarising the branch point analysis (now Figure 3 – Figure Supplement 2F).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work revealed an important finding that the blood-brain barrier (BBB) functionality changes with age and is more pronounced in males. The authors applied a non-invasive, contrast-agent-free approach of MRI called diffusion-prepared arterial spin labeling (DP-pCASL) to a large cohort of healthy human volunteers. DP-pCASL works by tracking the movement of magnetically labeled water (spins) in blood as it perfuses brain tissue. It probes the molecular diffusion of water, which is sensitive to microstructural barriers, and characterizes the signal coming from fast-moving spins as blood and slow-moving spins as tissue, using different diffusion gradients (b-values). This differentiation is then used to assess the water exchange rates (kw) across the BBB, which acts as a marker for BBB functionality. The main finding of the authors is that kw decreases with age, and in some brain regions, kw decreases faster in males. The neuroprotective role of the female sex hormone, estrogen, on BBB function is discussed as one of the explanations for this finding, supported by literature. The study also shows that BBB function remains stable until the early 60s and remarkably decreases thereafter.

      Strengths:

      The two main strengths of the study are the MRI method used and the amount of data. The authors employed a contrast-agent-free MRI method called ASL, which offers the opportunity to repeat such experiments multiple times without any health risk - a significant advantage of ASL. Since ASL is an emerging field that requires further exploration and testing, a study evaluating blood-brain barrier functionality is of great importance. The authors utilized a large dataset of healthy humans, where volunteer data from various studies were combined to create a substantial pool. This strategy is effective for statistically evaluating differences in age and gender.

      Weaknesses:

      R1.0: Gender-related differences are only present in some brain regions, not in the whole brain or gray matter - which is usually the assumption unless stated otherwise. From the title, this was not clear. Including simulations could increase readers' understanding related to model fitting and the interdependence of parameters, if present. The discussion follows a clear line of argument supported by literature; however, focusing solely on AQP4 channels and missing a critical consideration of other known/proven changes in transport mechanisms through the BBB and their effects substantially weakens the discussion. 

      Thanks for your insightful feedback and suggestions. We have made the following changes to the manuscript:

      (1) The title has been modified to highlight the sex differences in specific brain regions: “Age-Related Decline in Blood-Brain Barrier Function is More Pronounced in Males than Females in Parietal and Temporal Regions.”

      (2) To study the potential impact of prolonged ATT seen in males on estimated kw, we simulated kw distribution for females by adjusting ATT by +60 ms to match males' ATT. This led to marginally higher kw values (Supplemental Figure S2), suggesting that the kw difference between males and females is not a direct result of prolonged ATT. Additionally, we have added a section titled “Data and Code Availability Statements” in the revised manuscript to indicate that we are willing to share the reconstruction toolbox with interested groups. The toolbox is a standalone MATLAB-based program (no license required) to generate kw, CBF, and ATT maps, which can run on Windows or Mac computers.

      (3) We agree with the reviewer that BBB water exchange can be facilitated by other transport mechanisms, as we mentioned in the introduction: “Water exchange across the BBB occurs at a relatively high level and is mediated by passive diffusion, active co-transport through the endothelial membrane, and facilitated diffusion through the dedicated water channel, aquaporin-4 (AQP4), at the end-feet of astrocytes.” We emphasized our findings related to AQP4 based on the technical properties of DP-pCASL, which is more sensitive to the exchange occurring across astrocyte end-feet. We also acknowledge that different techniques can be helpful to study other components of BBB water exchange, and we have added the following discussion to the updated manuscript: “Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method. These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging. In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological states. Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements.”

      Reviewer #1 (Recommendations For The Authors): 

      R1.1 The manuscript is well-organized and presents arguments in a logical order. The visual representation of results in the form of figures is sufficient (see style suggestions below). 

      Thanks for your suggestions on improving the figures, we have updated figures for better visualization (Please see our response to R1.5, R1.6, R1.7 and R1.8).

      R1.2 It would be beneficial if the model/toolbox could be made publicly available so that fellow researchers from the community could apply and test it in their research. 

      We have added a section “Data and code availability statements” in the revised manuscript to indicate we’re willing to share the toolbox to the interested groups (L529 in the annotated manuscript). The toolbox is a standalone MATLAB-based program (no license required) to generate kw, CBF and ATT maps, which can run on windows or MAC computers. Indeed, we have been sharing our reconstruction toolbox with over 50 collaboration sites. The following screenshots are examples of three steps performed by the toolbox (shared by one collaborator):

      Author response image 1.

      Step 1: Loading raw data and calculate T1 map

      Author response image 2.

      Step 2: Motion correction and skull stripping

      Author response image 3.

      Step 3: kw, CBF and ATT quantification (nii files will be saved)

      R1.3 Line 46 states that the technique is novel, but it has been introduced and used before (Shao, et al. MRM 2019). It sure is innovative but the term novel is too strong and may confuse the readers that it is something new introduced in this manuscript.

      Thanks for the suggestion, we agree the term ‘novel’ may cause confusion about the technique, we have removed it in the revised manuscript (L48, L50).

      R1.4 Line 395, kw was generated using PLD = 1.8s with b = 0, 50 s/mm2. Is only one-time point enough for estimating kw? To me, it is not clear how robust is the kw estimation with only one PLD.

      According to the single-pass approximation (SPA) model (1), kw can be accurately estimated when the PLD is longer than the ATT. We recruited cognitively normal participants in this study and found the longest ATT to be 1526.7±117.4 and 1468.1±166.9 ms in aged (62-92 years) males and females, respectively. A PLD of 1.8 s was chosen to balance the SNR of the data and the accuracy of the model fitting, which should be sufficient for this study. However, for future studies involving diseased populations with prolonged ATT, a longer PLD should be used, or a multi-PLD protocol could be helpful to improve the robustness of quantification accuracy.

      We have added a limitation statement in the revised manuscript (L407): "A single PLD of 1800 ms was used in this study, which should be sufficient to allow all the labeled water to reach the tissue (i.e., the longest ATT was 1526.7±117.4 and 1468.1±166.9 ms in aged males and females, respectively) (1). However, a longer PLD should be used in participants with longer expected ATT, such as in stroke and cerebrovascular disorders. Additionally, a multi-PLD protocol can also be helpful to improve the robustness of quantification accuracy (2)."

      R1.5 Suggestion: Figure 3A, colormap for kw appears suboptimal. Regional differences are hard to see.

      Thanks for the suggestion, we have updated the range of color scale (from [0, 200], to [70, 160]) to highlight the regional differences in the updated Figure 3:

      We prefer to use the same blue colormap that we and our collaborators have been using this for publications to maintain consistence. We also acknowledged the limitation of the spatial resolution of kw maps in the updated manuscript (L412): “To compensate for the half signal loss of the non-CPMG DP module, relatively low spatial resolution and TGV-regularized SPA modeling were employed. Our recently development of a motion-compensated diffusion weighted (MCDW)-pCASL can be utilized to improve the spatial resolution in the future studies (e.g. 3.5 mm3 isotropic maps in 10 mins) (2)”

      R1.6 Suggestion: use same/similar colormaps for the same parameters (kw, ATT, CBF) to help the reader follow across Figures 3, 4, and 5.

      Thanks for your suggestion, we agree that using the same color would be easier for readers to follow the context. However, figures 4 and 5 were created to show the age and sex dependent changes, so that we used warm and cold colors to indicate effects of decrease and increase, respectively. We clarified the choice of colormap in the figure captions (L260, L284): “The effects of decrease or increase were represented by warm colors (yellow to red) and cold (gray to blue) colors, respectively.”

      R1.7 Suggestion: please be consistent with the ordering of parameters in Figures 3, 4, and 5.

      Thanks for the suggestion, we have updated Figure 3 to consistently show kw, CBF and ATT results in order from left to right:

      R1.8 Suggestion: use the same scaling (e.g.[|1.9|, |11 |] for Fig. 4, [|1.9|, |4|] for Figure 5) to enhance comparability across parameters in the subfigures.

      Thanks for the suggestion, we agree that the same scaling would enhance the comparability across parameters. We have updated the color scales for Figure 5 using maximal |T| = 4:

      However, range of maximal |T| was relatively large for Figure 4 (i.e. 5 for kw, 11 for CBF and 7 for ATT), and using the same color scale might oversaturate the regional responses or diminish the visibility of regional differences. Therefore, we prefer to keep the original color scale for Figure 4.

      R1.9 In Figure 5, the interaction of age with sex in kw parameter seems to be more on one side of the brain. What could be the reasons for possible lateralization? 

      We agree with the reviewer that the age and sex interaction effects emphasized on one side is an interesting finding. While we do not have a clear explanation now, we suspect it may relate to aging-related asymmetrical vascular burdens. Giannakopoulos et al. reported that vascular scores, indicating higher vascular burden, were significantly higher in the left hemisphere across all Clinical Dementia Rating scores. Moreover, the predominance of Alzheimer’s disease and vascular pathology in the right hemisphere correlated with significantly higher Clinical Dementia Rating scores  (3). We added the following to the updated manuscript to discuss this potential mechanism (L370): “… We also observed an asymmetric effect on left and right brain hemispheres, which might be associated with asymmetrically developed vascular burdens in aging (3)."

      R1.10 A comparison between the present study and DCE MRI as well as other ASL methods evaluating BBB function with age is missing. ASL techniques probing transverse relaxation and DCE MRI have reported increased kw with age in humans as well as in animal models. What could be the reasons? 

      We agree with the reviewer that BBB water exchange measured by other methods should be sufficiently discussed, especially regarding their age-related changes. We added the following discussion in the updated manuscript (L415): “Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years) (4). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL (5) and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method (6). These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging (5, 6). In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina) (2). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological states (7, 8). Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements (9-13).”

      R1.11 Line 163/164, a rapid decrease of CBF in males in the region of the hippocampus is reported. It would be beneficial to discuss this in discussion further (has this been reported before, possible reasons, etc). 

      Thanks for the suggestion, we agree that the accelerated CBF decline in males in the hippocampus is an important finding, we have added discussion in the revised manuscript (L300): "Furthermore, we found a more pronounced age-related decline in CBF in the hippocampus of males compared to females (Fig. 2, Supplemental Table S2). To the best of our knowledge, no study has previously reported this accelerated hippocampal CBF decline in males. This finding may be linked to the accelerated hippocampal volume loss in males, as reported in a study analyzing 19,793 generally healthy UK Biobank participants (14). Lower hippocampal perfusion has been associated with poor memory performance (15, 16), suggesting that males might be more vulnerable to potential cognitive decline (17).

      R1.12 Lines 198-202 describe a simulation done to test the dependence of kw on ATT. This is important and could be explained more in detail. Adding simulation results (numeric or figure) to supplementary materials would increase reproducibility and understanding for others. 

      We apologize for not referencing to the simulation results in the main text. We simulated kw distribution for females by adjusting ATT by +60 ms to matching males’ ATT, leading to a marginally higher kw values. And these results were shown in the Supplemental Figure S2 C (yellow):

      We have now referenced the simulation results in the updated manuscript (L206).

      R1.13 No limitations of the presented work are mentioned. A critical perspective would increase the scientific impact on future research decisions and implementation of this method by others. 

      Thanks for the suggestion, we agree the limitations need to be acknowledged. We have added a limitation paragraph in the revised manuscript (L406): "Limitations of the study and future directions: There are a few limitations of this study. A single PLD of 1800 ms was used in this study, which should be sufficient to allow all the labeled water to reach the tissue (i.e., the longest ATT was 1526.7±117.4 and 1468.1±166.9 ms in aged males and females, respectively) (1). However, a longer PLD should be used in participants with longer expected ATT, such as in stroke and cerebrovascular disorders. Additionally, a multi-PLD protocol can also be helpful to improve the robustness of quantification accuracy (2). To compensate for the half signal loss of the non-CPMG DP module, relatively low spatial resolution and TGV-regularized SPA modeling were employed. Our recently development of a motion-compensated diffusion weighted (MCDW)-pCASL can be utilized to improve the spatial resolution in the future studies (e.g. 3.5 mm3 isotropic maps in 10 mins) (2). Mahroo et al., utilized a multi-echo ASL technique to measure BBB permeability to water and reported shorter intra-voxel transit time and lower BBB exchange time (Tex) in the older participants (≥50 years) compared to the younger group (≤20 years) (4). In animal studies, reduced BBB Tex was also reported in the older mice compared to the younger group using multi-echo ASL (5) and a multi-flip-angle, multi-echo dynamic contrast-enhanced (MFAME-DCE) MRI method (6). These findings contrast with the results presented in this study, likely due to the different components assessed by different techniques, and increased BBB permeability to water has been suggested to indicate a leakage of tight junctions in aging (5, 6). In contrast, our recent study utilizing high resolution MCDW-pCASL scans with long averages reveals the potential existence of an intermediate stage of water exchange between vascular and tissue compartments (e.g., paravascular space or basal lamina) (2). The DP module of the DP-pCASL is hypothesized to null the fast-flowing and pseudo-random oriented spins, which may include both vascular flow and less restricted water in paravascular space. The observed lower kw in older participants may be more related to the delayed exchange across the astrocyte end-feet into the tissue due to loss of AQP-4 water channel with older age. However, these hypotheses require further investigation to understand the exact mechanisms, especially under different physiological stages (7, 8). Future studies, particularly with animal models targeting specific BBB components under different physiological or diseased conditions, will be valuable for validating these measurements (9-13). Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research (18, 19). However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health (20). For example, education has been shown to be highly relevant to regional CBF changes in AD (21, 22). Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies. Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to the unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.

      Reviewer #2 (Public Review):

      Summary: 

      This study used a novel diffusion-weighted pseudo-continuous arterial spin labelling (pCASL) technique to simultaneously explore age- and sex-related differences in brain tissue perfusion (i.e., cerebral blood flow (CBF) & arterial transit time (ATT) - a measure of CBF delivery to brain tissue) and blood-brain barrier (BBB) function, measured as the water exchange (kw) across the BBB. While age- and sex-related effects on CBF are well known, this study provides new insights to support the growing evidence of these important factors in cerebrovascular health, particularly in BBB function. Across the brain, the decline in CBF and BBB function (kw) and elevation in ATT were reported in older adults, after the age of 60, and more so in males compared to females. This was also evident in key cognitive regions including the insular, prefrontal, and medial temporal regions, stressing the consideration of age and sex in these brain physiological assessments. 

      Strengths: 

      Simultaneous assessment of CBF with BBB along with transit time and at the voxel-level helped elucidate the brain's vulnerability to age and sex-effects. It is apparent that the investigators carefully designed this study to assess regional associations of age and sex with attention to exploring potential non-linear effects. 

      Weaknesses: 

      R2.0 It appears that no brain region showed concurrent CBF and BBB dysfunction (kw), based on the results reported in the main manuscript and supplemental information. Was an association analysis between CBF and kw performed? There is a potential effect of the level of formal education on CBF (PMID: 12633147; 15534055), which could have been considered and accounted for as well, especially for a cohort with stated diversity (age, race, sex). 

      Thank you for your positive feedback and comments on the potential associations between BBB kw and other physiological parameters (e.g., CBF) and socioeconomic factors (e.g., education). We have made the following changes to the updated manuscript:

      (1) We conducted additional linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining). The results are summarized in Supplemental Table S6. We found that BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, parahippocampal gyrus, and medial temporal lobe in participants younger than 62 years, when kw was relatively consistent across ages. However, no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional ROIs, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years. These results suggest that BBB function may be influenced by different aspects of neurovascular function represented by CBF and ATT at different stages of aging.

      (2) One limitation of this study is the lack of information on participants’ geographical, cultural, physical characteristics, and socioeconomic factors. While we included race as a covariate to account for potential variations observed in previous research, race is an imprecise proxy for the complex interplay of genetic, environmental, socioeconomic, and cultural factors that influence physiological outcomes. We have acknowledged this limitation by adding the following discussion in the updated manuscript: “Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research. However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health. For example, education has been shown to be highly relevant to regional CBF changes in AD. Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies.”

      Reviewer #2 (Recommendations For The Authors): 

      General comments: 

      I commend the authors on a very well-written and laid-out study. General remarks have been provided in the short assessment and public review sections. 

      We would like to thank the reviewer for the insightful suggestions and overall positive feedback. We have substantial revised and improved our manuscript, and point-to-point responses can be found in the following sections and in the annotated manuscript.

      Specific comments: 

      Results: 

      R2.1 Line 127: "since race may influence the changes in perfusion and kw with aging, it was included as a covariate". It is not clear how race - a simplistic term for ethnicity or to be more specific ancestry has been shown to influence changes in perfusion? Is it known for a fact that for example, older Black people have lower/higher CBF or kw compared to Asians or Asians to Caucasian Americans? Can this be extrapolated to Japanese Brazilians having different patterns of regional CBF to Caucasian or Black Brazilians or similar patterns of CBF to Japanese people in Japan since they share similar race? Do Dutch people in the Netherlands share CBF characteristics to their descendants in the US or in South Africa? Would the geographical, cultural, and other physical characteristics of one's ethnicity or lineage impact CBF? Race is often used as a poor substitute for the complex interactions of physical, socioeconomic, and geopolitical factors that produce disparities that may have measurable biological effects including CBF. But it is not clear why being one race vs the other will impact CBF, without carefully parcelling out the many factors beyond biology, if any. Is any of the participants in the study mixed race? How about recently settled individuals who may identify for example as Black but have spent all their life up to adult years outside of the US and marked here in the study as simply African American? Not that I am saying this is the case. However this simplification may require more careful analysis. 

      In our study, no participant indicated to be mixed-race, and unfortunately we do not have additional information about their specific ancestry or information about their geographical, cultural, and other physical characteristics. We acknowledge that race is an imprecise proxy for the complex interplay of genetic, environmental, socioeconomic, and cultural factors that influence physiological outcomes, including perfusion and BBB function. The use of race as a covariate in our study is intended to account for potential variations observed in previous research, rather than to imply a direct causal relationship.

      Research has shown differences in blood flow among racial groups (18, 19). However, these differences are not solely attributable to race, and they are also shaped by environmental exposures, lifestyle factors, healthcare access, and other social determinants of health (20). We have added the following discussion in the updated manuscript (L436): “Including race as a covariate in our study aims to account for potential variations in brain perfusion observed in previous research (18, 19). However, it is important to recognize that these differences may not be solely attributable to race. They can be influenced by a complex interplay of factors such as education, environmental exposures, lifestyle, healthcare access, and other social determinants of health (20). For example, education has been shown to be highly relevant to regional CBF changes in AD (21, 22). Additionally, the potential influence of ancestry and mixed-race on perfusion and BBB function requires further investigation in future studies.”

      R2.2 Figure 3: Could the standard deviation of the reported values be also stated so the variance can be appreciated? 

      Thanks for the suggestion, we have added the standard deviation of the kw, CBF and ATT values on the updated Figure 3:

      R2.3 Discussions: Line 280: .."observed distinct trajectory of kw changes with aging as compared with CBF and ATT. I presume this as compared to the earlier statements (line 268) of pervasive increase in ATT and decrease in CBF across the brain. Were there any brain regions that showed increased ATT, decreased CBF and kw as a function of age or even sex?? Was there any association between CBF and kw in any brain regions, across the participants after controlling for sex differences? If there is a suspicion of early BBB dysfunction (line 286) preceding cognitive decline that has been also suspected with CBF, is this concomitant with CBF in most people? This could maybe make CBF an easier and more straightforward biomarker since its effects mirror that of BBB? I suspect it generally does not, even in healthy aging. It would have been great to shed more light on this with your results and in your discussion.

      Thank you for your comments. By 'distinct trajectory of kw changes with aging,' we refer to the ‘turning point’ in age at which kw starts declining. BBB kw remained relatively stable and began to decline in the early 60s, while CBF consistently decreased and ATT consistently increased with age, although the rates of change differed at 22 years and 36 years, respectively. Using linear regressions for voxel analysis, Figure 4 shows that age-dependent decreases in CBF and increases in ATT were observed in most of the brain. However, significant age-related decreases in kw were more localized to specific brain regions and were mostly accompanied by simultaneous decreases in CBF and increases in ATT. We highlighted this finding in the updated manuscript (L250): “In the brain regions showing significant age-related kw decreases (Fig. 4A), these decreases are mostly accompanied by CBF decreases (Fig. 4B) and ATT increases (Fig. 4C).”

      Thank you for your suggestion regarding the relationship between kw and CBF. We further conducted linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining). The results are summarized Supplemental Table S6.

      This new supplemental tables shows many interesting results. BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, parahippocampal gyrus, and medial temporal lobe in participants younger than 62 years, when kw was relatively consistent across ages. However, no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional ROIs, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years.

      We have added the following discussion to the updated manuscript (L307): 'We observed a distinct trajectory of kw changes with aging compared to CBF and ATT. To study the potential regional associations between kw and CBF and ATT, we conducted linear regressions between regional kw and regional CBF or ATT, incorporating sex as a covariate, for participants aged 8-61 years and 62-92 years (when BBB kw starts declining), respectively. The results are shown in Supplemental Table S6. BBB kw was significantly negatively associated with CBF in the putamen, amygdala, hippocampus, PHG, and MTL in participants aged 8-61 years (when kw was relatively consistent across ages), but no significant correlations were found in any brain regions in the 62-92 years group. In contrast to CBF, kw was significantly negatively associated with ATT in the GM, temporal lobe, and precuneus in participants aged 8-61 years, and these correlations became significant in additional brain regions, including WM, frontal lobe, ACC, caudate, putamen, amygdala, hippocampus, PHG, and MTL in participants aged 62-92 years. These results suggest that BBB function may be affected by different aspects of neurovascular function represented by CBF and ATT at different stages of aging."

      Other notes: 

      R2.4 While reading the results section, two things that jump out at me when I saw the sex differences: 1) hematocrit and 2) menopausal status. I saw in the discussion that these were touched on. I may have missed this in the methods, was hematocrit collected and included in the parameters estimates?? Was the menopausal status including ERT (estrogen replacement therapies) recorded and factored in? If not these could be included as limitations that may confound the results, especially when the age groups were split to include a group comprising or potentially both pre-and post-menopausal females (36-61). 

      We do not have the information about hematocrit nor menopausal status and they were not included in data analysis. We agree this is a limitation of the current study and we discussed in the updated manuscript (L442): “Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to data unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.”

      R2.5 The general vascular health of the cohort is not well described especially if some of the participants were from sickle cell study. While they are cognitively normal and free from major medical illnesses, or neurological disorders, did the sample also include individuals with considerable vascular risk factors and metabolic syndrome (known to affect CBF), especially in the older cohort?? 

      We agree with the reviewer that vascular health can significantly impact perfusion and BBB function. Since the data presented in this study were collected from multiple cohorts, vascular risk factors were not available in all cohorts and thus were not included as covariates in the data analysis. To account for potential vascular variations across participants, we included CBF and ATT as covariates in our analysis on age related BBB kw changes. We have added discussion in the updated manuscript (L442, same as our response to the previous comment): “Other factors such as hematocrit (23), menopausal status (24, 25), and vascular risk factors (26) should also be considered. These variables were not included in this study due to data unavailability or limited availability in some cohorts. We attempted to minimize the impact of these factors on our observations by including a relatively large and diverse sample. However, future studies examining the specific mechanism of each of these factors on BBB function in aging would be valuable.”.

      References:

      (1) K. S. St Lawrence, D. Owen, D. J. Wang, A two-stage approach for measuring vascular water exchange and arterial transit time by diffusion-weighted perfusion MRI. Magn Reson Med 67, 1275-1284 (2012).

      (2) X. Shao, C. Zhao, Q. Shou, K. S. St Lawrence, D. J. Wang, Quantification of blood–brain barrier water exchange and permeability with multidelay diffusion‐weighted pseudo‐continuous arterial spin labeling. Magnetic Resonance in Medicine  (2023).

      (3) P. Giannakopoulos, E. Kövari, F. R. Herrmann, P. R. Hof, C. Bouras, Interhemispheric distribution of Alzheimer disease and vascular pathology in brain aging. Stroke  (2009).

      (4) A. Mahroo, S. Konstandin, M. Günther, Blood–Brain Barrier Permeability to Water Measured Using Multiple Echo Time Arterial Spin Labeling MRI in the Aging Human Brain. Journal of Magnetic Resonance Imaging 59, 1269-1282 (2024).

      (5) Y. Ohene et al., Increased blood–brain barrier permeability to water in the aging brain detected using noninvasive multi‐TE ASL MRI. Magnetic resonance in medicine 85, 326-333 (2021).

      (6) B. R. Dickie, H. Boutin, G. J. Parker, L. M. Parkes, Alzheimer's disease pathology is associated with earlier alterations to blood–brain barrier water permeability compared with healthy ageing in TgF344‐AD rats. NMR in Biomedicine 34, e4510 (2021).

      (7) Y. Ying et al., Heterogeneous blood‐brain barrier dysfunction in cerebral small vessel diseases. Alzheimer's & Dementia  (2024).

      (8) V. Zachariou et al., Regional differences in the link between water exchange rate across the blood–brain barrier and cognitive performance in normal aging. GeroScience, 1-18 (2023).

      (9) Y. Zhang et al., Increased cerebral vascularization and decreased water exchange across the blood-brain barrier in aquaporin-4 knockout mice. PLoS One 14, e0218415 (2019).

      (10) Y. Ohene et al., Non-invasive MRI of brain clearance pathways using multiple echo time arterial spin labelling: an aquaporin-4 study. NeuroImage 188, 515-523 (2019).

      (11) Y. V. Tiwari, J. Lu, Q. Shen, B. Cerqueira, T. Q. Duong, Magnetic resonance imaging of blood–brain barrier permeability in ischemic stroke using diffusion-weighted arterial spin labeling in rats. Journal of Cerebral Blood Flow & Metabolism 37, 2706-2715 (2017).

      (12) Z. Wei et al., Non-contrast assessment of blood-brain barrier permeability to water in mice: an arterial spin labeling study at cerebral veins. NeuroImage, 119870 (2023).

      (13) Y. Jia et al., Transmembrane water-efflux rate measured by magnetic resonance imaging as a biomarker of the expression of aquaporin-4 in gliomas. Nature Biomedical Engineering 7, 236-252 (2023).

      (14) L. Nobis et al., Hippocampal volume across age: Nomograms derived from over 19,700 people in UK Biobank. NeuroImage: Clinical 23, 101904 (2019).

      (15) S. Rane et al., Inverse correspondence between hippocampal perfusion and verbal memory performance in older adults. Hippocampus 23, 213-220 (2013).

      (16) S. Heo et al., Resting hippocampal blood flow, spatial memory and aging. Brain research 1315, 119-127 (2010).

      (17) O. Gannon, L. Robison, A. Custozzo, K. Zuloaga, Sex differences in risk factors for vascular contributions to cognitive impairment & dementia. Neurochemistry international 127, 38-55 (2019).

      (18) A. E. Leeuwis et al., Cerebral blood flow and cognitive functioning in a community-based, multi-ethnic cohort: the SABRE study. Frontiers in aging neuroscience 10, 279 (2018).

      (19) L. R. Clark et al., Association of cardiovascular and Alzheimer’s disease risk factors with intracranial arterial blood flow in Whites and African Americans. Journal of Alzheimer's Disease 72, 919-929 (2019).

      (20) D. R. Williams, S. A. Mohammed, Discrimination and racial disparities in health: evidence and needed research. Journal of behavioral medicine 32, 20-47 (2009).

      (21) N. Scarmeas et al., Association of life activities with cerebral blood flow in Alzheimer disease: implications for the cognitive reserve hypothesis. Archives of neurology 60, 359-365 (2003).

      (22) N.-T. Chiu, B.-F. Lee, S. Hsiao, M.-C. Pai, Educational level influences regional cerebral blood flow in patients with Alzheimer’s disease. Journal of Nuclear Medicine 45, 1860-1863 (2004).

      (23) R. C. Gur et al., Gender differences in age effect on brain atrophy measured by magnetic resonance imaging. Proceedings of the National Academy of Sciences 88, 2845-2849 (1991).

      (24) M. J. Cipolla, J. A. Godfrey, M. J. Wiegman, The effect of ovariectomy and estrogen on penetrating brain arterioles and blood-brain barrier permeability. Microcirculation 16, 685-693 (2009).

      (25) A. C. Wilson et al., Reproductive hormones regulate the selective permeability of the blood-brain barrier. Biochim Biophys Acta 1782, 401-407 (2008).

      (26) M. S. Stringer et al., Tracer kinetic assessment of blood–brain barrier leakage and blood volume in cerebral small vessel disease: Associations with disease burden and vascular risk factors. NeuroImage: Clinical 32, 102883 (2021).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, by using simulation, in vitro and in vivo electrophysiology, and behavioral tests, Peng et al. nicely showed a new approach for the treatment of neuropathic pain in mice. They found that terahertz (THz) waves increased Kv conductance and decreased the frequency of action potentials in pyramidal neurons in the ACC region. Behaviorally, terahertz (THz) waves alleviated neuropathic pain in the mouse model. Overall, this is an interesting study. The experimental design is clear, the data is presented well, and the paper is well-written. I have a few suggestions.

      (1) The authors provide strong theoretical and experimental evidence for the impact of voltage-gated potassium channels by terahertz wave frequency. However, the modulation of action potential also relies on non-voltage-dependent ion channels. For example, I noticed that the RMP was affected by THz application (Figure 3F) as well. As the RMP is largely regulated by the leak potassium channels (Tandem-pore potassium channels), I would suggest testing whether terahertz wave photons have also any impact on the Kleak channels as well.

      Thank you for your positive comment and for providing us with this valuable suggestion. After testing the leak K+ current with and without HFTS on the SNI model, we observed a notable increase in the leak K+ current with HFTS when the holding potential surpassed -40 mV (please see the revised Figs. 2m and n). This finding prompted us to delve deeper into the shifts in the resting membrane potential (RMP). The data, along with statistical analysis, are detailed in Tables S1-3.

      (2) The activation curves of the Kv currents in Figure 2h seem to be not well-fitted. I would suggest testing a higher voltage (>100 mV) to collect more data to achieve a better fitting.

      Thanks for your advice. We repeated the experiment while maintaining the voltage of patched neurons at a higher level (>100 mV) to collect ample data for better fitting. The outcomes are illustrated in the revised Figs. 2g-j. Clearly, the data reveals a significant increase in K+ conductance in the HFTS group as compared to the SNI group. We have integrated these discoveries into the revised manuscript, replacing the earlier results.

      (3) In the part of behavior tests, the pain threshold increased after THz application and lasted within 60 mins. I suggest conducting prolonged tests to determine the end of the analgesic effect of terahertz waves.

      Thank you for your insightful comment. We echo your curiosity about the duration of the HFTS effect. In the process of revising our work, we conducted a comparative analysis of the analgesic duration resulting from 10-minute and 15-minute applications of HFTS. The findings are visualized in the revised Fig. 5c. Our observations indicate that after 160 minutes, the PWMT value for the 15-minute HFTS group decreased to a level comparable to that of the SNI group. Meanwhile, the analgesic effects persisted for 140 minutes in the case of the 10-minute HFTS application. These results imply a direct correlation between the duration of HFTS application and the duration of analgesia.

      (4) Regarding in vivo electrophysiological recordings, the post-HFTS recordings were acquired from a time window of up to 20 min. It seems that the HFTS effect lasted for minutes, but this was not tested in vitro where they looked at potassium currents. This long-lasting effect of HFTS is interesting. Can the authors discuss it and its possible mechanisms, or test it in slice electrophysiological experiments?

      Thank you for your comment. Based on the results from in vivo electrophysiological recordings, it was observed that the effect of HFTS can endure for a minimum of 20 minutes, and this duration was even more extended in behavioral assessments. Taking your advice, we employed slice electrophysiological recording for further testing. Following a 15-minute application of HFTS, we evaluated the K+ current at 5 and 20 minutes after incubation. Our observations clearly indicated a substantial and lasting increase in K+ current, with the effect persisting for at least 20 minutes (refer to Fig. 2l). This provides confirmation of the long-lasting influence of HFTS. The relevant data and statistical analysis are documented in Table S1-2.

      (5) How did the authors arrange the fiber for HFTS delivery and the electrode for in vivo multi-channel recordings? Providing a schematic illustration in Figure 4 would be useful.

      Thank you for your comment. To enhance the reader's understanding of the HFTS delivery device during multi-channel recording, we have included a schematic illustration in Fig. 4a in the revised manuscript. The top portion of Fig. 4a depicts a quantum cascade laser (QCL) with a center frequency located at approximately 36 THz. This laser is then connected to the recording electrode via a PIR fiber. The left section illustrates the detailed structure of the recording electrode.

      (6) Some grammatical errors should be corrected.

      Thank you for your thorough review. We have carefully checked and corrected grammar errors we found throughout the entire text to ensure that readers can better comprehend the content of the article.

      Reviewer #2 (Public Review):

      In this manuscript, Peng et al., reported that 36 THz high-frequency terahertz stimulation (HFTS) can suppress the activity of pyramidal neurons by enhancing the conductance of voltage-gated potassium channel. The authors also demonstrated the effectiveness of using 36THz HFTS for treating neuropathic pain.

      Strengths:

      The manuscript is well written and the conclusions are supported by robust results. This study highlighted the potential of using 36 THz HFTS for neuromodulation.

      Weaknesses:

      More characterization of HFTS is needed, so the readers can have a better assessment of the potential usage of HFTS in their own applications.

      Thank you for your suggestion. We have created schematic diagrams illustrating the HFTS delivery (Fig. 4a and Fig. 5a in the revised manuscript). Fig. 4a presents the structure designed for in vivo multi-channel recording. Fig. 5a shows the structure used in behavior test, the recording electrode is replaced by a metal hollow tube, allowing the PIR fiber to pass through the tube and target the ACC region of the mice.

      (1) It would be very helpful to estimate the volume of tissue that can be influenced by HFTS. It is not clear how 15 mins HFTS was chosen for this functional study. Does a longer time have a stronger effect? A better characterization of the relationship between the stimulus duration of HFTS and its beneficial effects would be very useful.

      Thank you for your feedback. The degree of tissue influence is directly related to the size of the spot emerging from the fiber outlet. In our experiment, we used a PIR fiber with a 630 nm inner core diameter to propagate high-frequency THz waves. This core features a refractive index of 2.15 and has an effective numerical aperture (NA) of 0.35 ± 0.05.

      Our decision to apply HFTS for 15 minutes in the behavioral study was primarily based on observations from in vivo multi-channel recordings. Specifically, we noticed a considerable reduction in the average firing rate of PYR cells after 15 minutes of HFTS exposure. To further investigate the correlation between the duration of HFTS stimulation and its effects, we conducted a comparative study using a 10-minute HFTS session. The results, depicted in revised Fig. 5c, reveal that the PWMT value decreased to the level seen in the SNI group after approximately 160 minutes following 15 minutes of HFTS, and after about 140 minutes with 10 minutes of HFTS. This suggests a direct relationship between the length of HFTS application and its beneficial outcomes.

      (2) How long does the behavioral effect last after 15 minutes of HFTS? Figure 5b only presents the behavioral effect for one hour, but the pain level is still effectively reduced at this time point. The behavioral measurement should last until pain sensitization drops back to pre-stim level.

      Thank you for your feedback. Similar question is also mentioned by reviewer 1. As depicted in Fig. 5c, it was observed that the analgesic effects lasted for 140-160 min with 10-15 minutes application of HFTS. Based on these findings, we can conclude that in the SNI model, targeting the ACC brain region with HFTS for a duration of 10-15 minutes results in an analgesic effect that lasts for roughly 140-160 minutes. This provides valuable insights into the potential clinical applications and duration of relief that can be achieved through HFTS treatment.

      (3) Although the manuscript only tested in ACC, it will also be useful to demonstrate the neural modulation effect on other brain regions. Would 36THz HFTS also robustly modulate activities in other brain regions? Or are different frequencies needed for different brain regions?

      Thank you for your comment. We hypothesize that light waves at a frequency of approximately 36 THz effectively modulate neuronal activities in various brain regions, primarily due to their impact on K channels. Additionally, we speculate that the application of THz waves at different frequencies may influence other channels, such as Na and Ca channels, potentially facilitating or inhibiting neuronal activities. We believe this is a fascinating and significant area of research to explore in the future.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript by Peng et al. presents intriguing data indicating that high-frequency terahertz stimulation (HFTS) of the anterior cingulate cortex (ACC) can alleviate neuropathic pain behaviors in mice. Specifically, the investigators report that terahertz (THz) frequency stimulation widens the selectivity filter of potassium channels thereby increasing potassium conductance and leading to a reduction in the excitability of cortical neurons. In voltage clamp recordings from layer 5 ACC pyramidal neurons in acute brain slice, Peng et al. show that HFTS enhances K current while showing minimal effects on Na current. Current clamp recording analyses show that the spared nerve injury model of neuropathic pain decreases the current threshold for action potential (AP) generation and increases evoked AP frequency in layer 5 ACC pyramidal neurons, which is consistent with previous studies. Data are presented showing that ex-vivo treatment with HFTS in slice reduces these SNI-induced changes to excitability in layer 5 ACC pyramidal neurons. The authors also confirm that HFTS reduces the excitability of layer 5 ACC pyramidal neurons via in vivo multi-channel recordings from SNI mice. Lastly, the authors show that HFTS is effective at reducing mechanical allodynia in SNI using both the von Frey and Catwalk analyses. Overall, there is considerable enthusiasm for the findings presented in this manuscript given the need for non-pharmacological treatments for pain in the clinical setting.

      Strengths:

      The authors use a multifaceted approach that includes modeling, ex-vivo and in-vivo electrophysiological recordings, and behavioral analyses. Interpretation of the findings is consistent with the data presented. This preclinical work in mice provides new insight into the potential use of directed high-frequency stimulation to the cortex as a primary or adjunctive treatment for chronic pain.

      Weaknesses:

      There are a few concerns noted that if addressed, would significantly increase enthusiasm for the study.

      (1) The left Na current trace for SNI + HFTS in Figure 2B looks to have a significant series resistance error. Time constants (tau) for the rate of activation and inactivation for Na currents would be informative.

      Thank you for your feedback. We have carefully considered your comments and made several adjustments in the revised Figs. 2b-f to improve clarity and accuracy. Firstly, we have conducted a comparison of the time constants (tau) between the SNI group and the SNI+HFTS group. These time constants represent the latency of Na current activation or inactivation relative to the half-activated/inactivated voltage. Our analysis reveals that there is no statistically significant difference in tau between the two groups for both activation and deactivation curves. Secondly, we have updated the sample traces in Fig. 2b of the revised manuscript. These new traces illustrate that tau does not significantly differ between the SNI and SNI+HFTS groups, providing a visual representation of our findings. We believe that these modifications strengthen the presentation of our study's details and results, making the data more accessible and understandable for readers.

      (2) It is unclear why an unpaired t-test was performed for paired data in Figure 2. Also, statistical methods and values for non-significant data should be presented.

      Thank you for your comment. I think you mean the results in Fig. 3. We agree with you that we should use one-way ANOVA to analyze the data since there are more than 2 groups for comparison. We thus re-analyzed the data by using one-way ANOVA in Figs. 3g-k, and have included detailed statistical methods and P values in the revised manuscript.

      (3) It would seem logical to perform HFTS on ACC-Pyr neurons in acute slices from sham mice (i.e. Figure 3 scenario). These experiments would be informative given the data presented in Figure 4.

      Thank you for your valuable advice. During the revision process, we performed HFTS on ACC-PYR neurons in acute slices obtained from sham mice. The findings from this experiment have been integrated into the updated Fig. 3, where the sham group is represented by the green line and histogram (the revised Fig. 3 in the manuscript). It is noteworthy that a significant decrease in spike frequency was observed in the sham mice following HFTS.

      (4) As the data are presented in Figure 4g, it does not seem as if SNI significantly increased the mean firing rate for ACC-Pyr neurons, which is observed in the slice. The data were analyzed using a paired t-test within each group (sham and SNI), but there is no indication that statistical comparisons across groups were performed. If the argument is that HFTS can restore normal activity of ACC-Pyr neurons following SNI, this is a bit concerning if no significant increase in ACC-Pyr activity is observed in in-vivo recordings from SNI mice.

      Thank you for highlighting the inaccuracies in the analysis. After reviewing the data, we re-analyzed it using alternative statistical methods. In the revised version, since the data did not follow a normal distribution, we employed Wilcoxon matched-paired signed rank tests within the sham and SNI groups, and Mann-Whitney tests between the sham and SNI groups.

      Upon comparing the statistical outcomes across the groups, we found that the mean firing rate of 130 ACC neurons in SNI mice was significantly higher compared to that of 108 ACC neurons in sham mice (P = 0.0447, Mann-Whitney test). Notably, the mean firing rate of ACC-PYR exhibited a more pronounced increase with a P value of 0.0274 in SNI pre-HFTS versus sham pre-HFTS, while the mean firing rate of ACC-INT did not display a significant change across the groups. These findings align with the observations we made in the slice, reinforcing the validity of our results.

      (5) The authors indicate that the effects of HFTS are due to changes in Kv1.2. However, they do not directly test this. A blocking peptide or dendrotoxin could be used in voltage clamp recordings to eliminate Kv1.2 current and then test if this eliminates the effects of HFTS. If K current is completely blocked in VC recordings then the authors can claim that currents they are recording are Kv1.1 or 1.2.

      Thank you for your kind suggestion. In our research, we employed the Kv1.2 structure as a model to determine the response frequency of terahertz waves. Through both in vitro and in vivo experiments, we were able to demonstrate that the frequency of approximately 36 THz affects the Kv channel and its corresponding spike frequency. Upon analyzing the action potential waveform, we observed a notable variance in the resting membrane potential (RMP). This RMP is predominantly controlled by leak potassium channels, specifically the Tandem-pore potassium channels. In accordance with the recommendation of reviewer 1, we have addressed this particular aspect of our experimentation in the revised manuscript.

      We agree that we should use blocking peptides or dendrotoxin to eliminate Kv1.2 current. However, we meet problems in purchasing and delivery of the drugs. We thus added some explanation in the Discussion part to emphasize the value for this pharmacological experiment and can further confirm this in the future works.

      (6) The ACC is implicated in modulating the aversive aspect of pain. It would be interesting to know whether HFTS could induce conditioned place preference in SNI mice via negative reinforcement (i.e. alleviation of spontaneous pain due to the injury). This would strengthen the clinical relevance of using HFTS in treating pain.

      Thank you for this valuable advice. We share your intrigue regarding this experiment, and we fully recognize the importance and potential of further exploring this area. At present, however, our equipment and platform limitations prevent us from conducting the necessary tests. However, we remain committed to pursuing relevant research opportunities in the future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      (1) Study suggests that the effects of their tumor models of mouse behavioral are largely non-specific to the tumor as most behaviors are rescued by analgesic treatment. So, most of the changes were likely due to site-specific pain and not a unique signal from the tumor.

      The tumor generates pain at the site it is implanted, and it is likely amplified by the oral activities tumor bearing mice have to engage in. As there is no pain in the absence of the tumor, the pain is, by definition, caused by the tumor, not by the site. Concerning the relationship between pain and behavior, the behavioral assays undertaken in our study (nesting, cookie test, wheel running) were very limited in scope.  Two of these assays (nesting, cookie test) require use of the oral cavity. Only nesting and wheel running were assessed in the context of treatment for pain. Nesting behavior was completely restored with carprofen and buprenorphine treatment suggesting that in the absence of pain, mice were able to make perfect nests. Consistent with this, carprofen and buprenorphine treated animals also gained weight indicating that eating (another activity dependent on the oral cavity) was also restored.  Wheel running, an activity that does not rely on the oral cavity, was only partially restored with drug treatment. While additional behavioral tests are necessary to confirm this finding, the data suggest that there is pain-independent information relayed to the brain which accounts for this decline in wheel running.

      Reviewer #2:

      (1) The main claim is that tumor-infiltrating nerves underlie cancer-induced behavioral alterations, but the experimental interventions are not specific enough to support this. For example, all TRPV1 neurons, including those innervating the skin and internal organs, are ablated to examine sensory innervation of the tumor. Within the context of cancer, behavioral changes may be due to systemic inflammation, which may alter TRPV1 afferents outside the local proximity of tumor cells. A direct test of the claims of this paper would be to selectively inhibit/ablate nerve fibers innervating the tumor or mouth region.

      We agree with the reviewer that a direct test of the hypothesis would require selectively inhibiting the nerve fibers innervating the tumor and assessing the impact on behavior. Studies in the lab are on-going using pharmacological interventions to do this. These studies are beyond the scope of this current manuscript.

      (2) Behavioral results from TRPV1 neuron ablation studies are in part confounded by differing tumor sizes in ablated versus control mice. Are the differences in behavior potentially explained by the ablated animals having significantly smaller tumors? The differences in tumor sizes are not negligible. One way to examine this possibility might be to correlate behavioral outcomes with tumor size.

      As suggested by the reviewer, we have graphed nesting scores and time-to-interact (cookie test) relative to tumor volume.  In both cases, we used simple linear regression to fit the data and analyzed the slopes of the lines. In the case of nesting, there was no significant difference between the slopes. This is now included as Supplemental Figure 4A. In the case of the cookie test, there was a significant difference between the slopes. This is now included as Supplemental Figure 4B. Graphing the data in this way allows one to look at any given tumor volume and infer what the nesting score and the time-to-interact for the two groups of mice. The linear regression model fits the time to interact with the cookie reasonably well, thus from this graph, we can see that at any given tumor volume the time to interact with the cookie was generally shorter in TRPV1cre::DTAfl/wt animals as compared to C57BL/6 mice. Unfortunately, the linear regression does not fit the nesting data very well and thus it is more difficult to make the comparison of tumor volume and nesting score.

      The following text has been added to the results section.

      Given the impact of nociceptor neuron ablation on tumor growth, we wondered whether differences in tumor volume contributed to the behavioral differences we noted. Thus, the behavior data were graphed as a function of tumor volume (Supplemental Fig 4A, B). A simple linear regression model was used to fit the data. In the case of nesting scores, the linear regression did not fit the data points very well making it difficult to assess nesting scores at a given tumor volume (Supplemental Fig 4A). However, the linear regression model fit the time to interact data better. Here, the graph suggests that tumor volume did not influence behavior as at any given tumor volume the time to interact with the cookie is generally smaller in TRPV1-Cre::Floxed-DTA animals as compared to C57BL/6 animals (Supplemental Fig 4B).

      Reviewer #3:

      (1) The authors mention in their Discussion the need for additional experiments. Could they also include / comment on the potential impact on the anti-tumor immune system in their model?

      The following text has been added to the discussion:

      Neuro-immune interactions have been studied in the context of a variety of conditions including, but not limited to infection 109, inflammation 110,111, homeostasis in the gut 112-114, as well as neurological diseases115,116. Neuro-immune communications in the context of cancer and behavior have also been studied (e.g., sickness behavior, depression) 117-119 however, these studies did not assess these interactions at the tumor bed. Investigations into neuro-immune interactions occurring within primary malignancies which harbor nerves have shed light on these critical communications. In the context of melanoma, which is innervated by sensory nerves, we identified that release of the neuropeptide calcitonin gene related peptide (CGRP) induces immune suppression. This effect is mediated by CGRP binding to its receptor, RAMP1, which is expressed on CD8+ T cells 49. A study utilizing a different syngeneic model of oral cancer similarly found an immune suppressive role for CGRP 120-122. These studies demonstrate that neuro-immune interactions occur at the tumor bed. Our current findings indicating that tumor-infiltrating nerves connect to a circuit that includes regions within the brain suggest that neuro-immune interactions within the peripheral malignancy may contribute to the behavioral alterations we studied.

      (2) The authors mention the importance of inflammation contributing to pain in cancer but do not clearly highlight how this may play a role in their model. Can this be clarified?

      The following text has been added to the discussion section of the manuscript.

      Moreover, given that carprofen and buprenorphine decrease inflammation 104, their ability to restore normal nesting and cookie test behaviors (which require the use of the oral cavity where the tumor is located) suggests that inflammation at the tumor site contributed to the decline in these behaviors in vehicle-treated animals. Since both drugs were given systemically and each only partially restored wheel running, it suggests that systemic inflammation alone cannot fully account for the decline in wheel running seen in vehicle-treated animals. We posit that the inflammation- and pain-independent component of this behavioral decline is mediated via the transcriptional and functional alterations in the cancer-brain circuit.

      (3) The tumor model apparently requires isoflurane injection prior to tumor growth measurements. This is different from most other transplantable types of tumors used in the literature. Was this treatment also given to control (i.e., non-tumor) mice at the same time points? If not, can the authors comment on the impact of isoflurane (if any) in their model?

      Mice in all groups (tumor and non-tumor) were treated with isoflurane. This important detail has been added to the methods section.

      (4) The authors emphasize in several places that this is a male mouse model. They mention this as a limitation in the Discussion. Was there an original reason why they only tested male mice?

      The following text has been added in the discussion section:

      Head and neck cancer is predominantly a cancer in males; it occurs in males three times more often than in females 123, this disparity increases in certain parts of the world. While smoking cigarettes and drinking alcohol are risk factors for HPV negative head and neck squamous cell carcinoma, even males that do not smoke and drink are have a higher susceptibility for this cancer than females 124,125. Thus, our studies used only male mice. However, we do recognize that females also get this cancer. In fact, female patients with head and neck cancer, particularly oral cancer, report more pain than their male counterparts 126,127. These findings suggest that differences in tumor innervation exist in males and females.

      Therefore, another project in the lab has been to compare disease characteristics (including innervation and behavior) in male and female mice. The findings from this second study are the topic of a separate manuscript.

      Recommendations For The Authors:

      Reviewing editor:

      (1) Tumors can communicate with the brain via blood-borne agents from the tumor itself or immune cells that are activated by the tumor in addition to neurons that invade the tumor. The xia and malaise that accompanies some tumors can be mediated by direct innervation and/or the humoral factors because both can activate the same parabrachial pathway. This paper makes the case for the direct innervation being important but ignores the possibility of both being involved. The interesting observation that innervation supports tumor growth (perhaps via substance P) is troublesome because the slower appearance of behavioral consequences (Figures 4 & 5) could be attributed to the smaller tumor size. A nice control for humoral effects would be to implant the tumor cells someplace in the body where innervation does not occur (if possible) and then examine behavioral outcomes.

      In the course of several projects, we have implanted different tumor cell lines in different locations in mice (oral cavity, hind limb, flank, peritoneal cavity). In each location, tumor innervation occurs. This is not a phenomenon found only in mice as we completed an immunohistological survey of human cancers from different sites and found they are all innervated (PMID 34944001). These data are consistent with tumor and locally-released factors that recruit nerves to the tumor bed (PMID: 30327461)(PMID: 32051587)(PMID: 27989802). Thus, an implantation site that does not result in tumor innervation is currently unknown and likely does not exist.

      (2) The authors should address whether there is an inflammatory component in this tumor model.

      MOC2-7 tumors have been characterized as non-inflamed and poorly immunogenic 129-131.

      This information has been added to the methods section.

      (3) The RTX experiment in Figure 5 would be more compelling if the drug was injected directly into the tumor rather than injecting it in the flank, thus ablating all TRPV1-exressing neurons as in the genetic approach.

      While we agree with the reviewer that ablating the TRPV1-expressing neurons at the tumor site directly would be ideal, RTX treatment takes approximately one week for ablation to occur but a significant amount of inflammation is associated with this. Therefore, we wait a total of 4 weeks for the inflammation to resolve. By this time, tumors have generally reached sacrifice criteria. Thus, this approach would not enable the question to be answered Moreover, we are not aware of any studies in which RTX has been injected in the oral cavity or face. While RTX is utilized clinically to treat pain, it is typically administered intrathecally, epidurally or intra-ganglionically (PMID: 37894723).

      (4) The authors address affective aspects of pain but do not adequately address the sensory aspects, e.g., sensitivity to touch, heat and/or cold. They attribute the decrease in food disappearance (consumption) and nest building to oral pain, but it could be due to anhedonia and anorexia that can accompany tumor progression.

      Assaying for touch and heat/cold sensitivity in the oral cavity is a critical aspect of studying head and neck cancer that needs to be addressed. However, in rodents these assays are not trivial given that any touch/heat/cold in the area of the tumor (oral cavity) impacts the sensitive whiskers in that region which directly influence these assays. Thus, we have been refining assays (e.g., OPAD, facial von Frey) to address these important questions. The findings from these studies are beyond the scope of this manuscript.

      The reviewer makes a good point about anhedonia and anorexia. The following text has been added to the results section:

      Pain-induced anhedonia is mediated by changes in the reward pathway. Specifically, in the context of pain, dopaminergic neurons in the ventral tegmental area (VTA) become less responsive to pain and release less serotonin.  This decreased serotonin results in disinhibition of GABA release; the resulting increased GABA promotes an increased inhibitory drive leading to anhedonia  82 and, when extreme, anorexia. Carprofen and buprenorphine treatments completely reversed nesting behavior and significantly improved eating. Inflammation 83 and opioids 84 directly influence reward processing and though our tracing studies did not indicate that the tumor-brain circuit includes the VTA, this brain region may be indirectly impacted by tumor-induced pain in the oral cavity. Thus, an alternative interpretation of the data is that the effects of carprofen and buprenorphine treatments on nesting and food consumption may be due to inhibition of anhedonia (and anorexia) rather than, or in addition to, relieving oral pain.

      (5) Comment on why only males were used in this study.

      Please see response to public reviews.

      Reviewer #1:

      (1) Please provide a justification for the use of exclusively male mice and expand in the discussion if there is potential for these findings to be directly applicable to female mice as well.

      Please see response to public reviews.

      The following text has been added to the discussion:

      Head and neck cancer is predominantly a cancer in males; it occurs in males three times more often than in females 123, this disparity increases in certain parts of the world. While smoking cigarettes and drinking alcohol are risk factors for HPV negative head and neck squamous cell carcinoma, even males that do not smoke and drink are have a higher susceptibility for this cancer than females 124,125. Thus, our studies used only male mice. However, we do recognize that females also get this cancer. In fact, female patients with head and neck cancer, particularly oral cancer, report more pain than their male counterparts 126,127. These findings suggest that differences in tumor innervation exist in males and females.

      (2) When discussing the results shown in Figure 2, please include some mention of Fus, since it was the highest expressed transcript.

      The following text has been added to the results section regarding Fus.

      The gene demonstrating the highest increase in expression, Fus, was of particular interest; it increases in expression within DRG neurons following nerve injury and contributes to injury-induced pain 51,52. Of note, we purposefully used whole trigeminal ganglia rather than FACS-sorted tracer-positive dissociated neurons to avoid artificially imposing injury and altering the transcript levels of these cells 53,54. Thus, significantly elevated expression of Fus by ipsilateral TGM neurons from tumor-bearing animals suggests the presence of neuronal injury induced by the malignancy. This is consistent with our previous findings 55 and those of others 56 showing that tumor-infiltrating nerves harbor higher expression of nerve-injury transcripts and neuronal sensitization.

      (3) In line 197 please clarify the mice used. Were all mice tumor-bearing and some had nociceptors ablated, or was there a control (no tumor) group as well?

      Line 197 refers to Figure 4D. In this figure, panels B-D show quantification of cFos and DFosB in the spinal nucleus of the TGM (SpVc), The parabrachial nucleus (PBN) and the Central nucleus of the amygdala (CeA). These data are from C57BL/6 and TRPV1cre::DTAfl/wt animals all of whom had tumor. Supplementary Figure 3C also show quantification of cFos and DFosB but these are from control, non-tumor bearing animals. The fact that controls are non-tumor-bearing has been added to the supplemental figure legend and the text of the results section has been clarified as follows.

      While Fos expression was similar between non-tumor bearing mice of the two genotypes (Supplemental Fig. 3C-E), the absence of nociceptor neurons in tumor-bearing animals decreases cFos and DFosB in the PBN, and DFosB in the SpVc (Fig. 4B, C).

      (4) Overall it would improve the readability of the figures if the colors for the IHC channels were on the image itself and not exclusively in the figure legend.

      The colors for all the staining have been added to each panel.

      (5) It is not a problem that complete cartography was not done, but please include a justification for why the brain regions that were focused on were chosen.

      In order to ensure that our neural tracing technique captured only nerves present within the tumor bed, we restricted the injection of tracer to only 2 µl. We demonstrated that this small volume did not leak out of the tumor (Figure 1) and thus any tracer labeled neurons we identified were deemed as being connected in a circuit to nerves in the tumor bed. While we acknowledged that this calculated technical approach restricted our ability to tracer label all neurons in the tumor bed (as well as those they share circuitry with), it ensured no tracer leakage and inadvertent labeling of non-tumoral nerves. In non-tumor animals injected with 10 µl of tracer, labeled regions in the brain included the spinal nucleus of the trigeminal, the parabrachial nucleus, the central amygdala, the facial nucleus and the motor nucleus of the trigeminal. The regions that were tracer positive when tumor was injected were limited to the spinal nucleus of the trigeminal, the parabrachial nucleus and the central amygdala. Thus, the regions in the brain that we focused on were the areas that became tracer-positive following injection of tracer into the tumor.

      (6) Were the cells that were injected cultured in media with 10% fetal calf serum? If so was any inflammatory response seen? If not please state in the methods section the media that cells for injection were cultured in.

      The cells injected into animals were cultured in media containing 10% fetal calf serum. When cells are harvested for tumor injections, they are first washed two times with PBS and then trypsinized to detach the cells from the plate. Cells are collected, washed again with PBS and resuspended with DMEM without serum; this is what is injected into animals. We harvest cells in this way in order to eliminate any serum being injected into mice. This information has been added to the Methods section.

      (7) Would any of the differences in drug treatment (Carprofen vs Buprenorphine) be due to the differing routes of administration and metabolism of the drugs?

      Since carprofen and buprenorphine each resulted in similar behavioral impacts (nesting and wheel running), their different routes of administration seem to play a minor or no role in the behaviors assessed.

      (8) Please include in the methods section the specific approach and software that was used for processing calcium imaging data and calculating a relative change in fluorescence.

      The specific approach used for processing calcium imaging data and calculating relative change in fluorescence as well as the software used are all included in the methods section. Please see below:

      Ca2+ imaging. TGM neurons from non-tumor and tumor-bearing animals (n=4-6 mice/condition) were imaged on the same day. Neurons were incubated with the calcium indicator, Fluo-4AM, at 37°C for 20 min. After dye loading, the cells were washed, and Live Cell Imaging Solution (Thermo-Fisher) with 20 mM glucose was added. Calcium imaging was conducted at room temperature. Changes in intracellular Ca2+ were measured using a Nikon scanning confocal microscope with a 10x objective. Fluo-4AM was excited at 488 nm using an argon laser with intensity attenuated to 1%. The fluorescence images were acquired in the confocal frame (1024 × 1024 pixels) scan mode. After 1 min of baseline measure, capsaicin (300nM final concentration) was added. Ca2+ images were recorded before, during and after capsaicin application. Image acquisition and analysis were achieved using NIS-Elements imaging software. Fluo-4AM responses were standardized and shown as percent change from the initial frame. Data are presented as the relative change in fluorescence (DF/F0), where F0 is the basal fluorescence and DF=F-F0 with F being the measured intensity recorded during the experiment. Calcium responses were analyzed only for neurons responding to ionomycin (10 µM, positive control) to ensure neuronal health. Treatment with the cell permeable Ca2+ chelator, BAPTA (200 µM), served as a negative control.

      (9) Suggestions for Figure 1:

      - In Figures 1C, D, E, include labels for the days of tumor harvest.

      - Please make the size of the labels the same for 1K an 1L and align them.

      - Microscopy image in Figure 1L for SpVc looks like it may be at a different magnification.

      - If possible, include (either in the figure or the supplement) IHC images staining for Dcx and tau, which would complement the western blot data.

      The requested changes to the figures have been made. Unfortunately, we do not have Dcx and tau IHC staining of the day 4, 10 and 20 tumors.

      (10) Suggestions for Figure 2:

      - Include directly onto the graph in Figure 2a the legend for tumor-bearing (red) and non-tumor bearing (blue).

      - Keep consistent between Figure 2G and 2H/I if the tumor/nontumor will be labeled as T/N or Tumor/Control.

      The requested changes to the figures have been made.

      (11) Suggestions for Figure 3:

      - An example trace of calcium signal would complement Figure 3G, H well.

      Example tracings of calcium signal are already provided in Supplementary Figure 3A and B.

      Reviewer #2:

      (1) While the use of male mice is acknowledged, there is not a rationale for why female mice were not included in the study.

      Please see the response to Reviewer #1 (first question).

      (2) Criteria for euthanasia should be described in the Methods. This is especially needed for interpreting the survival curve in Figure 4H.

      Criteria for euthanasia in our IACUC approved protocol include:

      - maximum tumor volume of 1000mm3

      - edema

      - extended period of weight loss progressing to emaciation

      - impaired mobility or lesions interfering with eating, drinking or ambulation

      - rapid weight loss (>20% in 1 week)

      - weight loss at or more than 20% of baseline

      In addition to tumor size and weight loss, we use the body condition score to evaluate the state of animals and to determine euthanasia.  These details have been added to the Methods section.

      (3) At what stage in cancer progression were the Fos studies conducted for Figure 4A-D?

      The brains used for Fos staining (Fig 4B-D) were harvested at week 5 post-tumor implantation.

      (4) For Fos counts, what are the bregma coordinates for the sections that were quantified?

      SpVc:  -7.56 to -8.24mm

      PBN:  -4.96 to -5.52mm

      CeA:  -0.82mm to -1.94mm

      (5) Statistics are needed for the claim in Lines 171-173.

      The statistical analysis of Fos staining from tumor-bearing and non-tumor bearing brains are included in Figure 3D-F. The statistical analysis of ex vivo Ca+2 imaging of brains from tumor-bearing and non-tumor bearing animals are included in Figure 3 I and J.

      (6) How long was the baseline period for weight and food intake measurements? How long were the animals single-housed before taking the baseline measurements?  

      Baseline weight and food intake measurements were 2 weeks and animals were singly housed before baseline measurements for 2 weeks (a total of 4 weeks).

      Minor:

      (7) The authors might consider rewording the sentence on lines 59-62, given that it is abundantly clear from rodent studies that both the tumor and chemotherapy are associated with adverse behavioral outcomes.

      We have reworded the sentence as follows:  The association of cancer with impaired mental health is directly mediated by the disease, its treatment or both; these findings suggest that the development of a tumor alters brain functions.

      (8) Line 212 needs a space between the two sentences.

      This has been fixed.

      (9) Font size in Figure 2 is not consistent with the other figures.

      This has been fixed.

      (10) "DAPI" is the more conventional than "DaPi".

      This has been fixed.

      Editorial Comments and Suggestions:

      (1) The Abstract would be better if it were more concise, e.g. ~175 words.

      The abstract has been shortened as requested and now reads:

      Cancer patients often experience changes in mental health, prompting an exploration into whether nerves infiltrating tumors contribute to these alterations by impacting brain functions. Using a mouse model for head and neck cancer and neuronal tracing we show that tumor-infiltrating nerves connect to distinct brain areas. The activation of this neuronal circuitry altered behaviors (decreased nest-building, increased latency to eat a cookie, and reduced wheel running). Tumor-infiltrating nociceptor neurons exhibited heightened calcium activity and brain regions receiving these neural projections showed elevated cFos and delta FosB as well as increased calcium responses compared to non-tumor-bearing counterparts. The genetic elimination of nociceptor neurons decreased brain Fos expression and mitigated the behavioral alterations induced by the presence of the tumor. While analgesic treatment restored nesting and cookie test behaviors, it did not fully restore voluntary wheel running indicating that pain is not the exclusive driver of such behavioral shifts. Unraveling the interaction between the tumor, infiltrating nerves, and the brain is pivotal to developing targeted interventions to alleviate the mental health burdens associated with cancer.

      (2) Lines 28, 104, 258, 486, 521, and many other places, "utilized" should be "used" because the former refers to an application for which it is not intended, e.g. a hammer was utilized as a doorstop.

      The requested changes have been made.

      (3) Lines 32 and 73, it is not clear whether the basal activity is heightened or whether excitability is increased. "manifest" might be better than "harbor" on line 73.

      We have changed the wording in the abstract to be clearer. Moreover, our finding that TGM neurons from tumor-bearing animals have increased expression of the s1-Receptor and phosphorylated TRPV1 (Fig 2G-I) indicate that these neurons have increased excitability.

      (4) Line 34 and elsewhere, it would be better to refer to Fos because the is no need to distinguish cellular, cFos, from viral, vFos, in this context.

      The requested changes have been made.

      (5) Line 38, It would be better to refer to what was actually measured rather than "oral movements".

      The requested changes have been made. The sentence now reads: “While analgesic treatment restored nesting and cookie test behaviors, it did not fully restore voluntary wheel running.”

      (6) Line 84, CXCR3-null mouse on a C57BL/6 background.

      The requested change has been made.

      (7) Lines 86,129 wild-type, male mice.

      The requested change has been made.

      (8) Lines114-115, the brackets are not necessary.

      The requested change has been made.

      (9) Lines 118, 384, 409, 527, 589, 971, 974 always leave a space between numbers and units. Use Greek u for micro.

      The requested change has been made.

      (10) Lines 123-124, it is not clear that there is meaningful labeling within the CeA.

      We have replaced this image with a more representative one of the CeA from a tumor-bearing animal with clear tracer labeling.

      (11) Lines 125, 138, and 246 transcription was not measured, only transcript levels were measured.

      The requested changes have been made.

      (12) Line 133, I think >4 fold is meant.

      Thank you for catching that. I have fixed it to >4 fold.

      (13) Line 165, single-time-point assessment (add hyphens).

      The requested change has been made.

      (14) Line 181 and elsewhere including figure, the superscripts refer to alleles of the genes; hence approved gene names should be used in italics (as in Methods), TRPV1-Cre:: Floxed-DTA (without italics) would be acceptable.

      The requested changes have been made.

      (15) Line 182, nociceptor-neuron-ablated mice (add hyphens).

      The requested changes have been made.

      (16) Line 197, It is not clear that the "speed" of food disappearance was measured or that it is due to oral pain vs loss of appetite.

      The reviewer makes a good point. We have changed the sentence to read:

      To evaluate the effects of this disruption on cancer-induced behavioral changes, we assessed the animals’ general well-being through nesting behavior 32 and anhedonia using the cookie test 76,77, as well as  body weight and food disappearance as surrogates for oral pain and/or loss of appetite.

      (17) Line 199, The reduced tumor growth after ablation could account for most of the changes in the other parameters that were measured.

      We have graphed the nesting scores and time-to-interact with the cookie as a function of tumor volume.  These data are now included as Supplemental Figure 4 and suggest that at the same tumor volume, nesting scores and times-to-interact with the cookie are different between the groups.

      (18) Line 204 TPVP1 spelling. Is the TGN smaller after ablation of half of the neurons?

      The requested change has been made.

      (19) Line 235, "now" is not necessary.

      The requested change has been made.

      (20) Line 238-239 and elsewhere, a few references for to why the TGN-SpVc-PBN-CeA circuit is relevant would be helpful.

      The following references have been added regarding the relevance of this circuit to behavior:

      Molecular Brain 14: 94 (2021) (PMID 34167570)

      Neuropharmacology 198: 108757 (2021) (PMID 34461068)

      Frontiers in Cellular Neuroscience 16: 997360 (2022)  (PMID 36385947)

      Neuropsychopharmacology  49(3): 508-520 (2024) (PMID 37542159)

      (21) Lines 371, 434 and Figures, gm should be g or grams in scientific usage. Include JAX lab stock numbers for these mouse lines.

      The requested changes have been made.

      (22) Line 432, removing food for one hour is not a fast.

      The sentence has been reworded as follows: One hour prior to testing, mouse food is removed and the animals are acclimated to the brightly lit testing room.

      (23) Line 476, 5-um sections (add hyphen).

      The hyphen has been added.

      (24) Lines 988, and 1023, DAPI are usually shown this way.

      The requested change has been made.

      (25) Figure 1K, add Bregma levels to figures.

      SpVc: -8.12 mm

      PBN: -5.34 mm

      CeA: -1.34 mm

      (26) Figure 3 line 1033, "area under the curve" What curve was examined?

      The curve examined was the change in fluorescence over time. This curve has been added as Supplemental Figure 3C.

      (27) Figure 3B, the circled area is the lateral PBN. At first glance, I thought scp was meant as the label for the circled area.

      Scp is noted in the figure legend as a landmark.

    1. Author response:

      Data replicability

      There are no replicates contained in the manuscript. (Reviewer #1)

      We respectfully disagree with this statement. In this manuscript, we included both cell and animal replicates. For cell replicates, we analyzed over 50.000 cells using RNAscope and over 10.000 cells using RNAseq, employing two independent methods on different animals. We believe this extensive analysis is sufficient by any standards. Regarding animal replicates, we generated four different transgenic lines (two knockin lines and two BAC transgenic lines), which is an uncommon and rigorous effort. We analyzed dozens of animals, consistently observing the expression pattern of Smim32 and its derived transgenes across multiple experiments, including crosses between transgenics and various reporter lines, which is again an uncommon and rigorous effort. These experiments were conducted on animals from different litters to ensure robustness. Additionally, our longitudinal study, which includes 13 animals harvested at two-day intervals from E16 to P20, provides further consistency of our data. 

      However, to underscore the consistency of endogenous Smim32 expression, when submitting a revised manuscript, we will present Smim32 expression levels across individuals in single-cell RNA-seq data. Furthermore, we will pool data from different transgenic animals to demonstrate interindividual variability in the claustrum of adult animals. 

      Additional examples of female mice should also be included and separately quantified. (Reviewer #1)

      We initially analyzed both males and females for one line (the Smim32-Cre knock-in line). Since we observed no differences between males and females (which we will note in the revised manuscript), we subsequently limited our analyses to males to minimize the use of animals. 

      Claustrum definition

      Weaknesses lie in poor anatomical definitions of the claustrum (and endopiriform nucleus). (Reviewer #2)

      No other orthogonal approaches were used to define the claustrum, such as retrograde neuroanatomical tracing from cortex. (Reviewer #3)

      We share the reviewers’ opinion that the claustrum (CLA) and endopiriform nucleus (EN) are poorly defined anatomically in rodent brains due to the limited development of white matter tracts. This ambiguity has led to many conflicting descriptions of CLA/EN boundaries in various papers and atlases, including those by Paxinos and the Allen Brain Institute. Notably, the Allen Institute frequently updates the shape and anatomical location of the CLA/EN in their reference atlas, resulting in different websites displaying various versions (as illustrated in rebuttal figure 1 at comparable levels of the anteroposterior brain axis). It remains uncertain which version would most effectively satisfy the entire scientific community, if any. Indeed, after many years of working on these structures and surveying the literature, we regret to note that there is currently no consensus on the anatomical definition of the CLA and EN, even among expert laboratories using tracing or staining methods. At one end of the spectrum, some authors define the CLA as a small nucleus that could be, for example, characterized by the PVrich plexus. At the other end, other authors consider it part of a larger complex that includes the EN and extends dorsally to the S2 cortex. Additionally, differing definitions of the core and shell regions, as well as the precise anteroposterior extent of the nucleus, further complicate the issue.

      Author response image 1.

      Comparison of CLA and EN shapes in two recent versions of the Allen brain atlas

      Given this lack of consensus, we deliberately opted for a molecular definition of the claustrum and its projection neurons. We used a set of well-documented canonical markers for the claustrum and neighboring neurons to determine the expression pattern of Smim32. The claustrum-specific markers we selected (Nr4a2, Lxn, Gnb4, Car3, etc.) have been extensively studied and allow us to distinguish claustrum projection neurons from neighboring and intermingled populations. Although none of these individual markers are exclusively specific to CLA and EN neurons, the combined expression of these markers provides greater confidence in identifying the different neuronal populations in space.

      Smim32 expression is used to define claustrum anatomical boundaries, rather than first using several structural, molecular, and connectivity lines of evidence to define the claustrum anatomically and then to assess whether Smim32 expression fits within this anatomical definition. (Reviewer #2)

      Contrary to the reviewer's suggestion, we do not define the claustrum based on Smim32 expression. Instead, Figures 1 and 2 demonstrate that Smim32 expression is highly correlated with the expression of known claustrum markers (Nr4a2, Lxn, Gnb4, Car3, etc.), both regionally and at the cellular level. As suggested by Peng et al. (2021, Fig. 4 and Extended Data Fig. 11), this population of cells, which includes the claustrum, a specific subset of cells in cortical layer 6, and the dorsal endopiriform nucleus, forms a discrete group of neurons sharing the same transcriptomic identity. Given what is known about the connectivity of claustrum and endopiriform nucleus projection neurons, this population obviously includes neurons projecting to various areas, likely fulfilling distinct functions. Whether these cells should be subdivided based on projection area, developmental origin, or structural features is beyond the scope of this article.

      Specificity issues

      Cre/Flp expression driven by the Smim32 promoter is present in non-claustrum regions, including the neighboring cortex, striatum, and endopiriform nucleus as well as the more distant thalamic reticular nucleus. (Reviewer #2)

      The Smim32 gene is not specific to the claustrum. (Reviewer #3)

      We do not claim that endogenous Smim32 expression is exclusive to the claustrum or that the knock-in lines, by themselves, are sufficient to isolate claustrum neurons without combined approaches based on the transgenic lines presented here. However, there are significant differences in the expression pattern between endogenous Smim32 and the expression of Cre in the various derived transgenic lines, which might not have been clear in the current manuscript. Notably, there is no expression of Cre in the striatum and the thalamic reticular nucleus, and only sparse expression in the endopiriform nucleus in Tg61(Smim32-cre). Each transgenic line provides different levels of overlap with the endogenous Smim32 expression, with the Tg61(Smim32-cre)  line allowing for the most specific genetic access to claustrum neurons. Again, for greater specificity, any of these lines could be used in combined approaches, such as viral targeting (as shown in Figure 6A and B) or using transgenic intersectional (dual recombinase) approaches based on Cre- and Flp-expressing mice with an overlap in the claustrum, leading to circuit-specific and/or claustrum-only labeling.

      This means that our claims are supported by the observed data. However, we acknowledge that we may not have clearly explained the specificity of the random transgenes, which could have led some reviewers to believe that « the data do not support the claims ».

      We will clarify these points in the revised manuscript and include additional examples and quantifications to highlight the differences between endogenous Smim32 expression and Cre expression in the transgenic Tg61(Smim32-cre)  line.

      Regarding Cre-expressing cells in the neighboring cortex (layer 6 projection neurons), these cells are genetically distinct from other layer 6 cortical neurons and express the same canonical markers as claustrum projection neurons, likely sharing also the same transcriptomic identity. We will provide a more detailed characterization of these cells in the revised manuscript.

      Since Smim32 driven recombinase (in 61 or 62lrod) is not exclusively expressed in the claustrum, it is not clear how Smim32 is an advantage over possible Nr4a2 or, the more selective, GNB4 Cre driver lines. (Reviewer #2)

      Over the years, we have found a limited number of Cre lines used in the literature for targeting claustrum neurons. These include Gnb4-cre, Slc17a6-cre (also known as Vglut2-cre), Egr2-cre, Tg(Tbx21-cre), Ntng2-cre, Cux2-cre and Esr2-cre lines. We have not found any study describing and/or using an Nr4a2-cre line. Although a Nr4a2-Dre line exists (that we have studied in our laboratories), caution is warranted in its use, as it lacks the complete coding sequence of the Nr4a2 gene.

      One problem with Nr4a2 is its documented expression in the adjacent Layer 6b cortical neurons, which discards it as a suitable candidate to selectively target the claustrum. Furthermore, Nr4a2 is also expressed in a majority of the endopiriform nucleus neurons, whereas endogenous Smim32 is expressed in a smaller proportion of these cells, and is restricted mainly to the dorsal endopiriform nucleus. These reasons led us to select Smim32 over Nr4a2.

      Author response image 2.

      (A) In situ hybridization for various CLA/EN marker genes. (B) Developmental recombination observed outside the CLA/EN in various cre lines (all data from the Allen brain databases)

      What are the advantages of using the different Smim32-cre lines over the existing Cre lines mentioned above?

      Let’s first consider the Gnb4-cre line, which is considered one of the best available. Although the endogenous Gnb4 gene appears to have a similar expression pattern to Nr4a2, Slc17a6, and Smim32 in the striato-claustro-insular region of adult mice (Rebuttal Figure 2A), the results observed with the Gnb4-cre line either shows otherwise, or indicate that the Cre line does not fully recapitulate Gnb4 endogenous expression (Rebuttal Figure 3). Indeed some neurons in the insular cortex, piriform cortex, and putamen express the Cre recombinase (possibly due to low Gnb4 expression not detected in the in situ hybridization data of the Allen brain institute or due to nonspecific transgene expression) and will recombine viral vectors injected in adult mice (Rebuttal Figure 3). Therefore, this Cre expression outside the CLA/EN neurons in the Gnb4-cre line presents complications for data interpretation, depending on the viral injection coordinates and the quantity of injected vectors. 

      Author response image 3.

      Specificity of the Gnb4-Cre line tested with viral transduction in adult mice (all data from the Allen Brain Institute database). The top and middle rows display the same data but with different scaling of the lookup tables to highlight either the patterns of axonal projections (top) or the infected neurons themselves (middle). The bottom row shows a higher magnification of the infection site. Note that individual neurons cannot be resolved in experiment 485903475 due to signal saturation.  

      Cre expression in the CLA appears more specific in the various Smim32-cre transgenic lines than in many of the lines mentioned above. Although we have no doubt that the different existing transgenic lines can target CLA neurons, the selectivity of the targeting (for example, the fraction and types of CLA neurons versus potential non-CLA neurons) remains to be fully described for most of the lines. It is particularly true in the case of Tbx21 and Esr2 (used as drivers for the Tg(Tbx21-cre) and Esr2-cre transgenic lines). Tbx21 is not endogenously expressed in adult CLA neurons (evaluated by in situ and RNAseq data) and Egr2, if expressed in the claustrum, is not restricted to CLA neurons as it is an immediate early gene expressed in recently active neurons (Rebuttal Figure 2A). 

      Cre expression in the EN is observed in all Cre-expressing transgenic lines used to target the claustrum (with the exception of Slc17a6-cre). This can naturally be problematic for some approaches. Luckily, the random integrant Tg61(Smim32-cre) we describe in our manuscript shows a strong expression in the claustrum, and very limited expression outside the CLA (a very weak activity in the EN), representing a novel tool with improved claustrum selectivity. An advantage of the Tg61(Smim32-cre) over the Slc17a6-cre is that more CLA neurons can be targeted with the Tg61(Smim32-cre) line. 

      Another advantage of our four transgenic lines is their versatility; they can be used to recombine reporter lines as well as FRT-floxed and loxP-floxed knockouts in limited neuronal populations. They will be employed in the future for intersectional genetics to exclusively target CLA neurons. Existing transgenic lines cannot offer these possibilities because their marker genes are broadly expressed in the brain during embryogenesis, leading to the impact on a large number of non-CLA/EN neurons. This is evident in the Gnb4-cre and Slc17a6-cre lines crossed with the Ai14 reporter line expressing the fluorescent protein tomato (Rebuttal Figure 2B, right panels). Similar observations have been made for the Ntng2-Cre and Cux2-cre lines (see the Allen Brain Institute database for these data). Alternatively, inducible recombinase systems, such as the Gnb4-IRES2CreERT2-D line, could be used. However, the Gnb4-IRES2-CreERT2-D line requires tamoxifen to induce Cre recombination, which can be problematic depending on the research context, as well as recombinations in the absence of tamoxifen treatment (see experiments 560948627 and 560948194 in the Allen Brain database).

      It is unclear how Smim32 relates to claustrum in other mammalian species (e.g. primates) (Reviewer #3)

      As mentioned in the last paragraph of the introduction of the initial manuscript, Smim32 is specifically expressed in the claustrum of a primate species, Homo sapiens (reference 37 of the initially submitted manuscript).

      Availability of the transgenic mice

      These mice should be made available to the community through commercial vendors. (Reviewer #1 and #2 in private comments)

      We are pleased to see that two of the three reviewers would like to see these mice available. These mice will not be kept for ourselves, and we will distribute them at some point in time, but this will naturally occur after the publication of the revised manuscript.

      Critical comments on discussion and other topics

      A clear description of the search in the Allen Mouse Brain Atlas is missing. A search for Smim32 in the ISH mouse atlas did not provide any hits and so it would be useful to include in the methods or results section the exact query used for examination of Smim32 expression as well as other genes identified in this process. (Reviewer #2)

      Smim32 has been referred to by different names in various versions of the mouse genome. For the readers not versed in navigating genomes and annotations, before being officially named Smim32, this gene was originally called Gm6753 (as noted in the Allen Brain Institute database, see Rebuttal Figure 2A for an example of their in situ data) and later Gm45623.

      Several sentences highlighting the shortfalls of other approaches are overstated and should be toned down. (Reviewer #1)

      Very concerning is problematic language in the abstract and introduction sections that diminish the impact of several published studies (not cited) that have led to important findings regarding claustrum function. The authors create an argument that all the research performed thus far on the claustrum is unreliable because targeting the structure has been sub-optimal. (Reviewer #2)

      A more balanced discussion of the strengths and weaknesses of these mice should be included. (Reviewer #1)

      We regret if our choice of language inadvertently appeared to undermine the contributions of our colleagues; that was certainly not our intention. The paragraph in question was meant to address certain studies that we believe have led to inconsistent findings and unreliable data due to a lack of rigorous methodology in targeting claustrum projection neurons. To avoid singling out specific works, we chose not to cite them directly. We understand that some colleagues whose research does not fall under the “various cases” mentioned may feel unfairly targeted by this statement. We will revise this section to better clarify our intent and ensure it is respectful of all contributions. We will rephrase passages in the abstract, introduction, and discussion to provide a balanced view of the strengths and weaknesses of these mice.

      Our main goal is to provide tools to specifically target claustrum cells based on their transcriptomic identity, which we believe is the best means to assess the function of any neuronal population. Due to the intermingling of claustrum neurons with neighboring populations, employing stereotaxic injections in the claustrum without genetic segregation will always infect and label physically adjacent cells that do not belong to the claustrum, ontologically and functionally speaking. 

      Similarly, targeting claustrum neurons retrogradely by injecting into claustrum projection sites likely labels neurons from different populations. For instance, as reviewer 1 mentions Erwin et al. (2021), infecting retrosplenial projections without genetic specificity labels many claustrum Synpr+ neurons (considered the claustrum core), a small proportion of claustrum Nnat+ neurons (considered the claustrum shell by some, and non-claustrum neurons by others), and some neighboring cortical L6b neurons. These three populations have very different transcriptomic identities, connectivity patterns, and likely distinct functions.

      Thus, we believe that genetic specificity provides an important added value for selectively targeting the claustrum or claustro-insular complex.

      A better characterization of all data should be undertaken. (Reviewer #1)

      Having generated hundreds of transgenic lines over the years, we have never performed a more thorough analysis of transgenic lines, nor have a recollection of reading a publication evaluating at such a precise level the expression pattern of transgenes in mice. We, therefore, do not see exactly what the reviewer means by this remark. It is possible, not being native English speakers, that we did not grasp a certain form of joke.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      For the colony analysis, it is unclear from the methods and main text whether the initial individual sorted colonies were split and subject to different conditions to support the claim of bi-potency. The finding that 40% of colonies displayed tenogenic differentiation, may instead suggest heterogeneity of the sorted progenitor population. The methods as currently described, suggest that two different plates were subject to different induction conditions. It is therefore difficult to assess the strength of the claim of bi-potency.

      Thanks for your valuable comment. We are sorry for the confusing illustration of colony assay. In fact, we first obtained CD29+/CD56+ cells by FACs. Then these freshly isolated cells were randomly seeded to 96-well plate with density of 1 cell/well. Subsequently, the single cell in each plate was cultured with growth medium to form colonies for ten days. Then myogenic induction was performed in three 96-well plates and tenogenic induction was performed in another three 96-well plates for subsequent analyses. Thus, we agree with your point that the sorted progenitor population could be heterogeneous. Almost all the cells highly expressed myogenic progenitor genes PAX7/MYOD1/MYF5 (Figure 1g) and over 95% colonies successfully differentiated into myotubes (Figure 2g). Thus, we believe these obtained CD29+/CD56+ cells were myogenic progenitor cells, while a subgroup of these cells obtained bi-potency.

      This group uses the well-established CD56+/CD29+ sorting strategy to isolate muscle progenitor cells, however recent work has identified transcriptional heterogeneity within these human satellite cells (ie Barruet et al, eLife 2020). Given that they identify a tenocyte population in their human muscle biopsy in Figure 1a, it is critical to understand the heterogeneity contained within the population of human progenitors captured by the authors' FACS strategy and whether tenocytes contained within the muscle biopsy are also CD56+/CD29+.

      Thanks for your constructive suggestion. We will include more samples to perform scRNA-seq and reanalyze the data.

      The bulk RNA sequencing data presented in Figure 3 to contrast the expression of progenitor cells under different differentiation conditions are not sufficiently convincing. In particular, it is unclear whether more than one sample was used for the RNAseq analyses shown in Figure 3. The volcano plots have many genes aligned on distinct curves suggesting that there are few replicates or low expression. There is also a concern that the sorted cells may contain tenocytes as tendon genes SCX, MKX, and THBS4 were among the genes upregulated in the myogenic differentiation conditions (shown in Figure 3b).

      Thanks for your comment. Each group consisted of three samples for RNAseq analyses. We are sorry there exist a minor analysis mistake in Figure 3b and Figure 3c, which will be reanalyzed in the revised version. As for contamination of tenocytes, almost all the obtained cells highly expressed myogenic progenitor marker PAX7/MYOD1/MYF5 (Figure 1g-h). Low expression levels of tendon markers were identified in these cells (Figure 2a-c). Furthermore, although tendon genes slightly upregulated in myogenic differentiation conditions, these markers dramatically upregulated in tenogenic differentiation conditions (Figure 2c). Thus, we believe the tenogenic differentiation ability of sorted cells were mainly ascribed to CD29+/CD56+ myogenic progenitor cells.

      Reviewer #2 (Public Review):

      scRNAseq assay using total mononuclear cell population did not provide meaningful insight that enriched knowledge on CD56+/CD29+ cell population. CD56+/CD29+ cells information may have been lost due to the minority identity of these cells in the total skeletal muscle mononuclear population, especially given the total cell number used for scRNAseq was very low and no information on participant number and repeat sample number used for this assay. Using this data to claim a stem cell lineage relationship for MuSCs and tenocytes may not convincing, as seeing both cell types in the total muscle mononuclear population does not establish a lineage connection between them.

      Thanks for your constructive suggestion. We will include more samples to perform scRNA-seq and reanalyze the data.

      The TGF-b pathway assay uses a small molecular inhibitor of TGF-b to probe Smad2/3. The assay conclusion regarding Smad2/3 pathway responsible for tenocyte differentiation may be overinterpretation without Smad2/3 specific inhibitors being applied in the experiments.

      Thanks for your comment. We agree with your comment that we should revise it in the revision version.

      Reviewer #3 (Public Review):

      Comment: This dual differentiation capability was not observed in mouse muscle stem cells.

      Thanks for your comment. We have explored the tenogenic differentiation potential of mouse MuSCs both in vivo and in vitro. However, low tenogenic differentiation ability was revealed (Figure 4), which might be due to species diversity. Maybe it is more demanding for humans to maintain the homeostasis of the locomotion system and the whole organism locomotion ability in much longer life span and bigger body size. Thus, the current study also indicated that anima studies may not clinically relevant when investigating human diseases.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Mehrdad Kashefi et al. investigated the availability of planning future reaches while simultaneously controlling the execution of the current reach. Through a series of experiments employing a novel sequential arm reaching paradigm they developed, the authors made several findings: 1) participants demonstrate the capability to plan future reaches in advance, thereby accelerating the execution of the reaching sequence, 2) planning processes for future movements are not independent one another, however, it's not a single chunk neither, 3) Interaction among these planning processes optimizes the current movement for the movement that comes after for it.

      The question of this paper is very interesting, and the conclusions of this paper are well supported by data. However, certain aspects require further clarification and expansion.

      We thank reviewer one for their evaluation of the work.

      (1) The question of this study is whether future reach plans are available during an ongoing reach. In the abstract, the authors summarized that "participants plan at least two future reaches simultaneously with an ongoing reach and that the planning processes of the two future reaches are not independent of one another" and showed the evidence in the next sentences. However the evidence is about the relationship about ongoing reach and future plans but not about in between future plans (Line 52-55). But the last sentence (Line 55-58) mentioned about interactions between future plans only. There are some discrepancies between sentences. Could you make the abstract clear by mentioning interference between 1) ongoing movement and future plans and 2) in between future plans?

      We thank Reviewer for their comment. We have separated the longer sentence in the original abstract into two shorter ones. This should clarify that the two pieces of evidence pertain to the interaction of planning processes.

      (2) I understood the ongoing reach and future reaches are not independent from the results of first experiment (Figure 2). A target for the current reach is shown at Horizon 1, on the other hand, in Horizon 2, a current and a future target are shown on the screen. Inter-reach-interval was significantly reduced from H1 to H2 (Figure 2). The authors insist that "these results suggest that participants can plan two targets (I guess +1 and +2) ahead of the current reach (I guess +0)". But I think these results suggest that participants can plan a target (+1) ahead of the current reach (+0) because participants could see the current (+0) and a future target (+1) in H2. Could the authors please clarify this point?

      We thank Reviewer for raising this point. Our conclusion that “participants can plan two targets ahead of the current reach” is supported by the reduction in Inter-Response Interval (IRI) observed when comparing H2 to H3 in the 75 ms Dwell time condition. Specifically, on average, participants were 16 ms faster when they could see two future targets on the screen (H3) than when they could see only one (H2). To clarify this in the paper, we have revised the wording in line 124 to explicitly state that the conclusion pertains to the 75 ms Dwell time condition. Additionally, we emphasize that the strongest evidence for planning two future targets comes from the experiment shown in Figure 3.

      (3) Movement correction for jump of the +1 target takes longer time in H3 compared to H2 (Figure 4). Does this perturbation have any effect on reaching for +2 target? If the +1 jump doesn't affect reaching for +2 target, combined with the result that jump of the +2 target didn't affect the movement time of +1 target (Figure 3C), perturbation (target jump) only affects the movement directly perturbed. Is this implementation correct? If so, does these results support to decline future reaches are planned as motor chunk? I would like to know the author's thoughts about this.

      In the experiment presented in Figure 4, once we jumped the +1 target, the reach to that target was changed and participants replaned a corrective movement to the new location of the +1 target. This usually was followed by a longer-than-usual pause at the new location of +1 target for resuming the sequence and finishing the trial. Consequently, in these jump trials, it was impossible to compare the +2 reach to no-jump trials, as the normal sequence of movement was disrupted, and the reach to the +2 target originated from a different starting location. Nevertheless, we addressed the possibility that the two future reaches were planned as a chunk by the analysis shown in figure 5: There we showed that a displacement of the +2 target did not influence the reach to the +1 target, indicating that the movement plans could be updated independently.

      (4) Any discussion about Saccade position (Figure 7)?

      We thank reviewer 1 for this important comment. The following discussion section is added for the gaze position results.

      In our sequence task, participants switched their gaze location only once per reach, suggesting that information about the location of the next target is perceived parafoveally (Figure 7A). This observation aligns with previous studies (Clavagnier et al., 2007; González-Alvarez et al., 2007; Sivak and MacKenzie, 1990) that found participants keep their visual attention on the current sequence item and can perceive the location of spatial targets even when foveal vision is occluded. However, when comparing gaze locations for conditions Horizon >1, we observed that participants systematically biased their gaze location based on the sequence context. The gaze position shifted toward the next target, potentially allowing for more accurate location estimation (Figures 7C-D). Notably, changes in gaze location were observed even in Horizon 2, despite no changes in the curvature of hand movements in this horizon (Figure 6B). This suggests that information about the next target may first be available in the circuitry that controls eye movements and later in the cortical areas that control voluntary upper limb movements. Further control studies are required to investigate this hypothesis.

      Reviewer #2:

      Summary:

      In this work, Kashefi et al. investigate the planning of sequential reaching movements and how the additional information about future reaches affects planning and execution. This study, carried out with human subjects, extends a body of research in sequential movements to ask important questions: How many future reaches can you plan in advance? And how do those future plans interact with each other?

      The authors designed several experiments to address these questions, finding that information about future targets makes reaches more efficient in both timing and path curvature. Further, with some clever target jump manipulations, the authors show that plans for a distant future reach can influence plans for a near future reach, suggesting that the planning for multiple future reaches is not independent. Lastly, the authors show that information about future targets is acquired parafoveally--that is, subjects tend to fixate mainly on the target they are about to reach to, acquiring future target information by paying attention to targets outside the fixation point.

      The study opens up exciting questions about how this kind of multi-target planning is implemented in the brain. As the authors note in the manuscript, previous work in monkeys showed that preparatory neural activity for a future reaching movement can occur simultaneously with a current reaching movement, but that study was limited to the monkey only knowing about two future targets. It would be quite interesting to see how neural activity partitions preparatory activity for a third future target, given that this study shows that the third target's planning may interact with the second target's planning.

      Strengths:

      A major strength of this study is that the experiments and analyses are designed to answer complementary questions, which together form a relatively complete picture of how subjects act on future target information. This complete description of a complex behavior will be a boon to future work in understanding the neural control of sequential, compound movements.

      We thank the reviewer for their thorough reading of our work.

      Weaknesses:

      I found no real glaring weaknesses with the paper, though I do wish that there had been some more discussion of what happens to planning with longer dwell times in target. In the later parts of the manuscript, the authors mention that the co-articulation result (where reaches are curved to make future target acquisition more efficient) was less evident for longer dwell times, likely because for longer dwell times, the subject needs to fully stop in target before moving to the next one. This result made me wonder if the future plan interaction effect (tested with the target jumps) would have been affected by dwell time. As far as I can tell, the target jump portion only dealt with the shorter dwell times, but if the authors had longer dwell time data for these experiments, I would appreciate seeing the results and interpretations.

      We thank the reviewer for raising this point. In our time (Figure 2) and curvature analysis (Figure 6), we collected data with five levels of the horizon and three levels of dwell time to explore the space of parameters and to see if there is any interaction between dwell time and the horizon of planning the future targets. Apriori, we expected that the full stop in each target imposed by the 400 ms dwell time would be long enough to remove any effect of future targets on how the current move is executed. In line with our initial hypothesis, the systematic curvature of reaches based on the future target was smaller in longer dwell times (Figure 6E). Nevertheless, we observed a significant curvature even in 400 ms dwell time. Based on this observation, we expect running the jump experiments (Figures 4 and 5) in longer dwell times will lead to the same pattern of results but with a smaller effect size since longer dwells break the interdependence of sequence elements (Kalidindi & Crevecoeur, 2023). In the end, for the jump experiments, we limited our experimental conditions to the fastest dwell time (75 ms dwell) since we were conceptually interested in situations where movements in the sequence are maximally dependent on each other.

      Beyond this , the authors also mentioned in the results and discussion the idea of "neural resources" being assigned to replan movements, but it's not clear to me what this might actually mean concretely. I wonder if the authors have a toy model in mind for what this kind of resource reassignment could mean. I realize it would likely be quite speculative, but I would greatly appreciate a description or some sort of intuition if possible.

      Our use of the term "neural resources" is inspired by classic psychology literature on how cognitive resources such as attention and working memory are divided between multiple sequence components. Early studies on working memory suggest that human participants can retain and manipulate a fixed number of abstract items in working memory (Miller, 1956). However, more recent literature postulates that a specific number of items does not limit working memory, rather, it is limited by a finite attentional resource that is softly allocated to task items.

      Here we borrowed the same notion of soft distribution of resources for the preparation of multiple sequence items. A large portion of our observation in this paper and also previous work on sequence production can be explained by a simple model that assumes one central planning resource that is “softly” divided between sequence elements when participants see future items of the sequence (Author Response Image 1). The first sequence element receives the majority of the resources and is planned the most. The rest of the sequence receives the remaining planning resources in an exponentially decaying manner for preparation of the movement during the execution of the ongoing movement. Once the ongoing movement is over, the resource is then transferred to the next sequence item and this process is repeated until the sequence is over. Assignment of planning resources to future items explains why participants are faster when seeing future items (Figure 2). But this comes with a cost – if the ongoing movement is perturbed, the replanning process is delayed since some of the resources are occupied by future planning (Figure 4). This naturally leads to the question of how this resource allocation is implemented in neural tissue. To address this, we are conducting the same sequence task with the horizon in non-human primates (NHPs), and the investigation of these neural implementation questions will be the focus of future studies.

      Author response image 1.

      Basic diagram showing a soft distribution of a limited planning resource. The diagram shows a Horizon 3 condition in which two future reaches (+1 and +2) are planned while executing a movement (+0). The majority of resources is assigned to the execution of the ongoing movement while the reset is distributed for planning future movements. Once the movement is over, the chain of preparation and execution moves forward.

      Recommendations for the author:

      Reviewer #1

      We thank reviewer one for these comments regarding the clarity and consistency of figures and terminology.

      (1) Figure 3. Are "+1 Move" in Fig. 3B and "+ 1 Movement" in Fig. 3C as same as "E + 1" in Fig. 3A? Also does "Dwell" in Fig. 3B mean same as "+1 Dwell" in Fig. 3C? Consistent terminology would help readers to understand the figure.

      “+1 Move” in Figure 3B is the same as +1 movement in Figure 3C. “Dwell” in Figure 3B is the same as +1 Dwell in Figure 3C. We changed the figure for more consistency.

      (2) Figure 3. A type in the second last line in the legend, "pre-jump target for no-jump and jump and condition". The second "and" isn't necessary.

      The typo is corrected. Thank you.

      (3) Figure 4C. Is "Movement time" equivalent with "E + 1"?

      “Movement time” is equivalent to E+1 only in no-jump conditions. When the jump occurs,

      Movement time contains all the

      (4) Figure 6B. Is the gray circle in between the graph and target positions there by mistake?

      We fixed this typo. Thank you.

      (5) Figure 6E. It's hard to distinguish H2-H5 from the color differences.

      We changed the H5 to full white with a black stroke to improve the contrast. Thank you.

      (6) Figure 7A. Blue dots are almost invisible.

      We added a black stroke to blue circles for more visibility. Thank you.

      Reviewer #2

      I found this manuscript to be engaging and well written--many of the questions I had while reading were answered promptly in the next section. As such, my comments are mostly minor and primarily geared towards improving clarity in the manuscript.

      (1) One major recurring confusion I had while reading the manuscript was how to think about H1, H2, and H3. It was clearly explained in the text, and the explanations of the results were generally clear once I read through it all, but I found it strangely confusing at times when trying to interpret the figures for myself (e.g., in H2, 2 targets are on screen, but the second target can only be planned during the reach toward the first target). This confusion may just be me reading the manuscript over two days, but I wonder if it could be made clearer with some semantic iconography associated with each horizon added to the later figures alongside the H labels. As one option, perhaps the planning timeline part of Fig 1D could be simplified and shrunk down to make an icon for each horizon that clearly shows when planning overlaps for each horizon.

      (Please see the response to point #2 below)

      (2) Regarding Fig 1D: I like this figure, but it's unclear to me how the exact preparation and execution times are determined. Is this more of a general schematic of overlaps, or is there specific information about timing in here?

      We thank reviewer 2 for their important feedback. The role of Figure 1D was to summarize the timing of the experiments for different horizons. That is, to clarify the relative timing of the targets appearing on the screen (shown with a small circle above the horizontal line) and targets being captured by participants (the ticks and their associated number on the line). Execution is shown as the time interval that the hand is moving between the targets and planning is the potential planning time for participants from the target appearing on the screen until initiation of the reach to that target. We added the relevant parts of Figure 1D to the subplots for each subsequent experiment, to summarize the timing of other experiments and their analyses. For the experiments with target jump, a small vertical arrow shows the time of the target jump relative to other events.

      However, this figure will be less useful, if the connection between the timing dots and ticks is not communicated. We agree that in the original manuscript, this important figure was only briefly explained in the caption of Figure 1. We expanded the explanation in the caption of Figure 1 and referenced the dots and ticks in the main text.

      (3) Fig 6B - for some reason I got confused here: I thought the central target in this figure was the start target, and it took me embarrassingly long to figure out that the green target was the start target. This is likely because I'm used to seeing center-out behavioral figures. Incidentally, I wasn't confused by 7c (in fact, seeing 7c is what made me understand 6b), so maybe the solution is to clearly mark a directionality to the reach trajectories, or to point an arrow at the green target like in previous figures. Also, the bottom left gray target in the figure blends into the graph on the left--I didn't notice it until rereading. Because there's white space between that target and the green one, it might be good to introduce some white space to separate the graph from the targets more. The target arrangement makes more sense in panel C, but by the time I got there, I had already been a bit confused.

      Thanks for raising this point. As shown in Figure 6C, we used the reach to the +1 target for the curvature analysis. The confusion about Figure 6B is probably due to continuing the reach trajectories after the +1 target. That also explains why Figure 7C seemed more straightforward. To solve this issue we modified Figure 6B such that the reaches are shown with full opacity right until the +1 target and then shown with more transparency. We believe this change focuses the reader's attention to the reach initiated from the +0 target to the +1 target.

      As for the gray target in Figure 6B, we originally had the gray target as it is a potential start location for the reach to the +0 target, and for having similar visuals between the plots. The gray target is now removed from Figure 6B.

      (4) Line 253 - I'm not sure I understand the advantage over simple averaging that the authors mention here--would be nice to get a bit more intuition.

      Thanks for raising this point. We used a two-factor model in our analysis, with each factor representing the angle of the last and next target, respectively. Both factors had five levels: -120, -60, 0, 60, and 120 degrees relative to the +1 reach. In a balanced two-factor design, where each combination of factor levels has an equal number of trials, using a linear model and simple averaging would yield equivalent results. However, when the number of trials for the combinations of the two factors is unbalanced, simple averaging can lead to misleading differences in the levels of the second factor. Additionally, the linear model allows us to investigate potential interactions between the two factors, which is not possible with simple averaging.

      (5) Fig 7a - I would have liked to see the traces labeled in figure (i.e. hand trajectory vs. eye trajectory)

      Hand and eye trajectories are now labeled in the figure.

      (6) Fig 7c - very minor, but the hexagon of targets is rotated 30 degrees from all previous hexagons shown (also, this hex grid target arrangement can't lead to the trajectory shown in 7a, so it can't be that this was a different experimental grid). I'm guessing this was a simple oversight.

      We used the same grid in the eye-tracking experiment. The targets are to visually match the previous plots. Thank you for raising this point.

      Reference

      Clavagnier, S., Prado, J., Kennedy, H., & Perenin, M.-T. (2007). How humans reach: distinct cortical systems for central and peripheral vision. The Neuroscientist: A Review Journal Bringing Neurobiology, Neurology and Psychiatry, 13(1), 22–27.

      González-Alvarez, C., Subramanian, A., & Pardhan, S. (2007). Reaching and grasping with restricted peripheral vision. Ophthalmic & Physiological Optics: The Journal of the British College of Ophthalmic Opticians , 27(3), 265–274.

      Kalidindi, H. T., & Crevecoeur, F. (2023). Task dependent coarticulation of movement sequences (p.2023.12.15.571847). https://doi.org/10.1101/2023.12.15.571847

      Miller, G. A. (1956). The magical number seven plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 81–97.

      Sivak, B., & MacKenzie, C. L. (1990). Integration of visual information and motor output in reaching and grasping: the contributions of peripheral and central vision. Neuropsychologia, 28(10), 1095–1116.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Magnesium modulates phospholipid metabolism to promote bacterial phenotypic resistance to antibiotics", Li et al demonstrated the role of magnesium in promoting phenotypic resistance in V. alginolyticus. Using standard microbiological and metabolomic techniques, the authors have shown the significance of fatty acid biosynthesis pathway behind the resistance mechanism. This study is significant as it sheds light on the role of an exogenous factor in altering membrane composition, polarization, and fluidity which ultimately leads to antimicrobial resistance.

      Strengths:

      (1) The experiments were carried out methodically and logically.

      (2) An adequate number of replicates were used for the experiments.

      Weaknesses:

      (1) The introduction section needs to be more informative and to the point.

      (2) The weakest point of this paper is in the logistics through the results section. The way authors represented the figures and interpreted them in the results section (or the figure legends) does not match. The figures are difficult to interpret and are not at all self-explanatory.

      (3) There are too many mislabeling of the figure panels in the main text which makes it difficult to find out which figures the authors are explaining. There should be more explanation on why and how they did the experiments and how the results were interpreted.

      (1) We would like to extensive revise the introduction to make it more informative than the current version.

      (2) We will check the description in the text and labeling in the figures to make it is logic.

      (3) We will add the explanation of the experiments to make it clear that why we perform the assays.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors aimed to identify if and how magnesium affects the ability of two particular bacteria species to resist the action of antibiotics. In my view, the authors succeeded in their goals and presented a compelling study that will have important implications for the antibiotic resistance research community. Since metals like magnesium are present in all lab media compositions and are present in the host, the data presented in this study certainly will inspire additional research by the community. These could include research into whether other types of metals also induce multi-drug resistance, whether this phenomenon can be observed in other bacterial species, especially pathogenic species that cause clinical disease, and whether the underlying molecular determinants (i.e. enzymes) of metal-induced phenotypic resistance could be new antimicrobial drug targets themselves.

      Strengths:

      This study's strengths include that the authors used a variety of methodologies, all of which point to a clear effect of exogenous Mg2+ on drug resistance in the targeted species. I also commend the authors for carrying out a comprehensive study, spanning evaluation of whole cell phenotypes, metabolic pathways, genetic manipulation, to enzyme activity level evaluation. The fact that the authors uncovered a molecular mechanism underlying Mg2+-induced phenotypic resistance is particularly important as the key proteins should be studied further.

      Weaknesses:

      I believe there are weaknesses in the manuscript, however. The authors take for granted that the reader is familiar with all the assays utilized, and do not properly explain some experiments, and thus I highly suggest that the authors add a brief statement in each situation describing the rationale for each selected methodology (more details are in the private review to the authors). The Results section is also quite long and bogs down at times, and I suggest that the authors reduce its length by 10 to 20%. In contrast, the Introduction is sparse and lacks key aspects, for example, there should be mention of the study's main purpose and approaches, plus an introduction to the authors' choice of species and their known drug resistance properties, as well as the drug of choice (balofloxacin). Another notable weakness is that the authors evaluated Mg2+-induced phenotypic resistance only against two closely related species, and thus the generalizability of this mechanism of drug resistance is not known. The paper would be strengthened if the authors could demonstrate this type of phenotypic resistance in at least one more Gram-negative species and at least one Gram-positive species (antimicrobial susceptibility evaluations would suffice), each of which should be pathogenic to humans. Demonstrating magnesium-induced phenotypic drug resistance in the WHO Priority Bacterial Pathogens would be particularly important.

      We will add the explanation of the experiments to make it clear that why we perform the assays. And we will revise the introduction and shorten the length of the manuscript. Expanding the bacterial species is very good idea and we will perform such experiment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this work, Odenwald and colleagues show that mutant biotin ligases used to perform proximity-dependent biotin identification (TurboID) can be used to amplify signal in fluorescence microscopy and to label phase-separated compartments that are refractory to many immunofluorescence approaches. Using the parasite Trypanosoma brucei, they show that fluorescent methods such as expansion microscopy and CLEM, which require bright signals for optimal detection, benefit from the elevated signal provided by TurboID fusion proteins when coupled with labeled streptavidin. Moreover, they show that phase-separated compartments, where many antibody epitopes are occluded due to limited diffusion and potential sequestration, are labeled reliably with biotin deposited by a TurboID fusion protein that localizes within the compartment. They show successful labeling of the nucleolus, likely phase-separated portions of the nuclear pore, and stress granules. Lastly, they use a panel of nuclear pore-TurboID fusion proteins to map the regions of the T. brucei nuclear pore that appear to be phase-separated by comparing antibody labeling of the protein, which is susceptible to blocking, to the degree of biotin deposition detected by streptavidin, which is not. 

      Strengths: 

      Overall, this study shows that TurboID labelling and fluorescent streptavidin can be used to boost signal compared to conventional immunofluorescence in a manner similar to tyramide amplification, but without having to use antibodies. TurboID could prove to be a viable general strategy for labeling phase-separated structures in cells, and perhaps as a means of identifying these structures, which could also be useful. 

      Weaknesses: 

      However, I think that this work would benefit from additional controls to address if the improved detection that is being observed is due to the increased affinity and smaller size of streptavidin/biotin compared to IgGs, or if it has to do with the increased amount of binding epitope (biotin) being deposited compared to the number of available antibody epitopes. I also think that using the biotinylation signal produced by the TurboID fusion to track the location of the fusion protein and/or binding partners in cells comes with significant caveats that are not well addressed here, mostly due to the inability to discern which proteins are contributing to the observed biotin signal. 

      To dissect the contributions of the TurboID fusion to elevating signal, anti-biotin antibodies could be used to determine if the abundance of the biotin being deposited by the TurboID is what is increasing detection, or if streptavidin is essential for this.

      We agree with the reviewer, that it would be very interesting to distinguish whether the increase in signal comes from the multiple biotinylation sites or from streptavidin being a very good binder, or perhaps from both. However, this question is very hard to answer, as antibodies differ massively in their affinity to the antigen which is further dependent on the respective IF-conditions, and are therefore not directly comparible. Even if anti-biotin gives a better signal then anti-HA, this can be either caused by the increase in antigen-number (more biotin than HA-tag) or by the higher binding affinity, or by a combination of both, thus hard to distinguish. Nevertheless, we have tested monoclonal mouse anti-biotin targeting the (non-phase-separated) NUP158. We found the signal from the biotin-antibody to be much weaker than from anti-HA, indicating that, at least this particular biotin antibody, is not a very good binder in IF. 

      Alternatively, HaloTag or CLIP tagging could be used to see if diffusion of a small molecule tag other than biotin can overcome the labeling issue in phase-separated compartments. There are Halo-biotin substrates available that would allow the conjugation of 1 biotin per fusion protein, which would allow the authors to dissect the relative contributions of the high affinity of streptavidin from the increased amount of biotin that the TurboID introduces. 

      This is a very good idea, as in this case, the signals are both from streptavidin and are directly comparable. We expressed NUP158 with HaloTag and added PEG-biotin as a Halo ligand. However, PEG-biotin is poorly cell-permeable, and is in general only used on lysates. In trypanosomes, cell permeability is particular restricted, and even Halo-ligands that are considered highly cell-penetrant give only a weak signal. Even after over-night incubation, we could not get any signal with PEG-biotin. Our control, the TMR-ligand 647, gave a weak nuclear pore staining, confirming the correct expression and function of the HaloTag-NUP158.

      The idea of using the biotin signal from the TurboID fusion as a means to track the changing localization of the fusion protein or the location of interacting partners is an attractive idea, but the lack of certainty about what proteins are carrying the biotin signal makes it very difficult to make clear statements. For example, in the case of TurboID-PABP2, the appearance of a biotin signal at the cell posterior is proposed to be ALPH1, part of the mRNA decapping complex. However, because we are tracking biotin localization and biotin is being deposited on a variety of proteins, it is not formally possible to say that the posterior signal is ALPH1 or any other part of the decapping complex. For example, the posterior labeling could represent a localization of PABP2 that is not seen without the additional signal intensity provided by the TurboID fusion. There are also many cytoskeletal components present at the cell posterior that could be being biotinylated, not just the decapping complex. Similar arguments can be made for the localization data pertaining to MLP2 and NUP65/75. I would argue that the TurboID labeling allows you to enhance signal on structures, such as the NUPs, and effectively label compartments, but you lack the capacity to know precisely which proteins are being labeled.  

      We fully agree with the reviewer, that tracking proteins by streptavidin imaging alone is problematic, because it cannot distinguish, which protein is biotinylated. We therefore used words like “likely”  in the description of the data. However, we still think, it is a valid method, as long as it is confirmed by an orthogonal method. We have added this paragraph to the end of this chapter:

      “Importantly, tracking of proteins by streptavidin imaging requires orthogonal controls, as the imaging alone does not provide information about the nature of the biotinylated proteins. These can be proximity ligation assay, mass spectrometry or specific tagging visualisation of protein suspects by fluorescent tags. Once these orthogonal controls are established for a specific tracking, streptavidin imaging is an easy and cheap and highly versatile method to monitor protein interactions in a specific setting.”

      Reviewer #2 (Public Review): 

      Summary: 

      The authors noticed that there was an enhanced ability to detect nuclear pore proteins in trypanosomes using a streptavidin-biotin-based detection approach in comparison to conventional antibody-based detection, and this seemed particularly acute for phase-separated proteins. They explored this in detail for both standard imaging but also expansion microscopy and CLEM, testing resolution, signal strength, and sensitivity. An additional innovative approach exploits the proximity element of biotin labelling to identify where interacting proteins have been as well as where they are. 

      Strengths: 

      The data is high quality and convincing and will have obvious application, not just in the trypanosome field but also more broadly where proteins are tricky to detect or inaccessible due to phase separation (or some other steric limitations). It will be of wide utility and value in many cell biological studies and is timely due to the focus of interest on phase separation, CLEM, and expansion microscopy. 

      Thank you! We are glad you liked it.

      Reviewer #3 (Public Review): 

      Summary: 

      The authors aimed to investigate the effectiveness of streptavidin imaging as an alternative to traditional antibody labeling for visualizing proteins within cellular contexts. They sought to address challenges associated with antibody accessibility and inconsistent localization by comparing the performance of streptavidin imaging with a TurboID-HA tandem tag across various protein localization scenarios, including phase-separated regions. They aimed to assess the reliability, signal enhancement, and potential advantages of streptavidin imaging over antibody labeling techniques. 

      Overall, the study provides a convincing argument for the utility of streptavidin imaging in cellular protein visualization. By demonstrating the effectiveness of streptavidin imaging as an alternative to antibody labeling, the study offers a promising solution to issues of accessibility and localization variability. Furthermore, while streptavidin imaging shows significant advantages in signal enhancement and preservation of protein interactions, the authors must consider potential limitations and variations in its application. Factors such as the fact that tagging may sometimes impact protein function, background noise, non-specific binding, and the potential for off-target effects may impact the reliability and interpretation of results. Thus, careful validation and optimization of streptavidin imaging protocols are crucial to ensure reproducibility and accuracy across different experimental setups. 

      Strengths: 

      - Streptavidin imaging utilizes multiple biotinylation sites on both the target protein and adjacent proteins, resulting in a substantial signal boost. This enhancement is particularly beneficial for several applications with diluted antigens, such as expansion microscopy or correlative light and electron microscopy. 

      - This biotinylation process enables the identification and characterization of interacting proteins, allowing for a comprehensive understanding of protein-protein interactions within cellular contexts. 

      Weaknesses: 

      - One of the key advantages of antibodies is that they label native, endogenous proteins, i.e. without introducing any genetic modifications or exogenously expressed proteins. This is a major difference from the approach in this manuscript, and it is surprising that this limitation is not really mentioned, let alone expanded upon, anywhere in the manuscript. Tagging proteins often impacts their function (if not their localization), and this is also not discussed.

      - Given that BioID proximity labeling encompasses not only the protein of interest but also its entire interacting partner history, ensuring accurate localization of the protein of interest poses a challenge. 

      - The title of the publication suggests that this imaging technique is widely applicable. However, the authors did not show the ability to track the localization of several distinct proteins on the same sample, which could be an additional factor demonstrating the outperformance of streptavidin imaging compared with antibody labeling. Similarly, the work focuses only on small 2D samples. It would have been interesting to be able to compare this with 3D samples (e.g. cells encapsulated in an extracellular matrix) or to tissues.  

      Recommendations for the authors:

      To enhance the assessment from 'incomplete' to 'solid', the reviewers recommend that the following major issues be addressed: 

      Major issues: 

      (1) Anti-biotin antibodies in combination with TurboID labeling should be used to compare the signal/labelling penetrance to streptavidin results. That would show if elevated biotin deposition matters, or if it is really the smaller size, more fluors, and higher affinity of streptavidin that's making the difference. 

      We agree with the reviewer, that it would be very interesting to distinguish whether the increase in signal comes from the multiple biotinylation sites or from streptavidin being a very good binder, or perhaps from both, and whether the size matters (IgG versus streptavidin). However, this question is very hard to answer, as antibodies differ massively in their affinity to the antigen. Thus, even if antibiotin would give a better signal then anti-HA, this could be either caused by the increase in antigen-number (more biotin than HA-tag) or by the better binding affinity, or by a combination, and it would not allow to truly answer the question. We have now tested anti-biotin antibodies, also in repsonse to reviewer 1, and got a much poorer signal in comparison to anti-HA or streptavidin.

      Please note that we made another attempt using nanobodies to target phase-separated proteins, to see, whether size matters (Fig. 2I). The nanobody did not stain Mex67 at the nuclear pores, but gave a weak nucelolar signal for NOG1, which may suggest that the nanobody can slightly better penetrate than IgG, but it does not rule out that the nanobody simply binds with higher affinity. Reviewer 1 has suggested to use the Halo Tag with PEG-biotin: this would indeed allow to directly compare the streptavidin signal caused by the TurboID with a single biotin added by the Halo tag. Unfortunately, the PEG-biotin does not  penetrate trypanosome cells. In conclusion, we are not aware of a method that would allow to establish why streptavidin but not IgGs can penetrate to phase separated areas. We therefore prefer to not overinterpret our data, but stick to what is supported by the data: “the inability to label phase-separated areas is not restricted to anti-HA but applies to other antibodies”.

      (3) Figure 4 A-B. The validity of claiming the correct localization demonstrated by streptavidin imaging comes into question, especially when endogenous fluorescence, via the fusion protein, remains undetectable (as indicated by the yellow arrow at apex). 

      In this figure, the streptavidin imaging does NOT show the correct localisation of the bait protein, but it does show proteins from historic interactions that have a distinct localisation to the bait. We had therefore introduced this chapter with the paragraph below, to make sure, the reader is aware of the limitations (which we also see as an opportunity, if properly controlled):

      “We found that in most cases, streptavidin labelling faithfully reflects the steady state localisation of a bait protein, e.g., the localisation resembles those observed with immunofluorescence or direct fluorescence imaging of GFP-fusion proteins. For certain bait proteins, this is not the case, for example, if the bait protein or its interactors have a dynamic localisation to distinct compartments, or if interactions are highly transient. It is thus essential to control streptavidin-based de novo localisation data by either antibody labelling (if possible) or by direct fluorescence of fusion-proteins for each new bait protein.”

      In particular, on lines 450-460, there's a fundamental issue with the argument put forward here. It is not possible to formally know that the posterior labeling is ALPH1 vs. another part of the decapping complex that was associated with PABP2-Turbo, or if the higher detection capacity of the Turbo-biotin label is uncovering a novel localization of the PABP2. While it is likely that it is ALPH1, it is not possible to rule out other possibilities with this approach. These issues should be discussed here and more generally the possibility of off-target labeling with this approach should be addressed in the discussion. 

      We fully agree with the reviewer, that tracking proteins by streptavidin imaging alone is problematic, because it cannot distinguish, which protein is biotinylated. We therefore used words like “likely”  in the description of the data. However, we still think, it is a valid method, as long as it is back-uped by an orthogonal method. We have added this paragraph to the end of this chapter:

      “Importantly, tracking of proteins by streptavidin imaging requires orthogonal controls, as the imaging alone does not provide information about the nature of the biotinylated proteins. These can be proximity ligation assay, mass spectrometry or specific tagging visualisation of protein suspects by fluorescent tags. Once these orthogonal controls are established for a specific tracking, streptavidin imaging is an easy and cheap and highly versatile method to monitor protein interactions in a specific setting.”

      (4) More discussion and acknowledgment of the general limitations in using tagged proteins are needed to balance the manuscript, especially if the hope is to draw a comparison with antibody labeling, which works on endogenous proteins (not requiring a tag). For example: (a) tagging proteins requires genetic/molecular work ahead of time to engineer the constructs and/or cells if trying to tag endogenous proteins; (b) tagged proteins should technically be validated in rescue experiments to confirm the tag doesn't disrupt function in the cell/tissue/context of interest; and (c) exogenous tagged proteins compete with endogenous untagged proteins, which can complicate the interpretation of data.  

      We have added this paragraph to the first paragraph of the discussion part:

      “Like many methods that are frequently used in cell- and molecular biology, streptavidin imaging is based on the expression of a genetically engineered fusion protein: it is essential to validate both, function and localisation of the TurboID-HA tagged protein by orthogonal methods. If the fusion protein is non-functional or mis-localised, tagging at the other end may help, but if not, this protein cannot be imaged by streptavidin imaging. Likewise, target organisms not amenable to genetic manipulation, or those with restricted genetic tools,  are not or less suitable for this method.”

      Also, we like to point out that for non-mainstream organisms like trypanosomes, antibodies are not commercially available and often genetic manipulation is more time-efficient and cheaper than the production of antiserum against the target protein.

      Also, the introduction would ideally be more general in scope and introduce the pros and cons of antibody labeling vs biotin/streptavidin, which are mentioned briefly in the discussion. The fact that the biotin-streptavidin interaction is ~100-fold higher affinity than an IgG binding to its epitope is likely playing a key role in the results here. The difference in size between IgG and streptavidin, the likelihood that the tetrameric streptavidin carries more fluors than a IgG secondary, and the fact that biotin can likely diffuse into phase-separated environments should be clearly stated. The current introduction segues from a previous paper that a more general audience may not be familiar with. 

      We have now included this paragraph to the introduction:

      “It remains unclear, why streptavidin was able to stain biotinylated proteins within these antibody inaccessible regions, but possible reasons are: (i) tetrameric streptavidin is smaller and more compact than IgGs (60 kDa versus a tandem of two IgGs, each with 150 kDa) (ii) the interaction between streptavidin and biotin is ~100 fold stronger than a typical interaction between antibody and antigen and (iii) streptavidin contains four fluorophores, in contrast to only one per secondary IgG.”

      Minor issues: 

      The copy numbers of the HA and Ty1 epitope tags vary depending on the construct being used. For example, Ty1 is found as a single copy tag in the TurboID tag, but on the mNeonGreen tag there are 6 copies of the epitope. It makes it hard to know if differences in detection are due to variations in copies of the epitope tags. Line 372-374: can the authors explain why they chose to use nanobodies in this case? It would be great to show the innate mNeonGreen signal in 2K to compare to the Ty1 labeling. The presence of 6 copies of the Ty1 epitope could be essential to the labeling seen here.

      We agree with the reviewer, that these data are a bit confusing. We have now removed Figure 3K, as it is the only construct with 6 Ty1 instead of one, and it does not add to the conclusions. (the mNeonsignal is entirely in the nucleolus, as shown by Tryptag). We have also added an explanation why we used nanobodies (“The absence of a nanobody signal rules out that its simply the size of IgGs that prevents the staining of Mex67 at the nuclear pores, as nanobodies are smaller than (tetrameric) streptavidin”). However, as stated above, we prefer not to overinterpret the data, as signals from different antibodies/nanobodies – antigen combinations are not comparable. Important to us was to stress that the absence of signal in phase-separated areas is NOT restricted to the anti-HA antibody, which is clearly supported by the data.

      What is the innate streptavidin background labeling look like in cells that are not carrying a TurboID fusion, from the native proteins that are biotinylated? That should be discussed. 

      We have now included the controls without the TurboID fusions for trypanosomes and HeLa cells: “Wild type cells of both Trypanosomes and human showed only a very low streptavidin signal, indicating that the signal from naturally biotinylated proteins is neglectable (Figure S8 in supplementary material).”

      Line 328-331: This is likely to be dependent on whether or not the protein moves to different localizations within the cell. 

      True, we agree, and we have added this paragraph:

      “The one exception are very motile proteins that produce a “biotinylation trail” distinct to the steady state localisation; these exceptions, and how they can be exploited to understand protein interactions, are discussed in chapter 4 below. “

      Line 304-305: Does biotin supplementation not matter at all? 

      No, we never saw any increase in biotinylation when we added extra biotin to trypanosomes. The 0.8 µM biotin concentration in the medium were sufficient.

      Line 326-327: Was the addition of biotin checked for enhancement in the case of the mammalian NUP98? I would argue that there is a significant number of puncta in Figure 1D that are either green or magenta, not both. The amount of extranuclear puncta in the HA channel is also difficult to explain. Biotin supplementation to 500 µM was used in mammalian TurboID experiments in the original Nature Biotech paper- perhaps nanomolar levels are too low. 

      We now tested HeLa cells with 500 µM Biotin and saw an increase in signal, but also in background; due to the increased background  we conclude that low biotin concentrations are more suitable . We have also repeated the experiment using 4HA tags instead of 1HA, and we found a minor improvement in the antibody signal for NUP88 (while the phase separated NUP54 was still not detectable). We have replaced the images in Figure 1D  (NUP88) and also in Figure 2F (NUP54) with improved images and using 4HA tags. However, we like to note that single nuclear pore resolution is beyond what can be expected of light microscopy.

      Line 371: In 2I, I see a signal that looks like the nucleus, similar to the Ty1 labeling in 2G, so I don't think it's accurate to say that that Mex67 was "undetectable". Does the serum work for blotting? 

      Thank you, yes, “undetectable” was not the correct phrase here. Mex67 localises to the nuclear pores, to the nuceoplasm and to the nucleolus (GFP-tagging or streptavidin). Antibodies, either to the tag or to the endogenous proteins, fail to detect Mex67 at the nuclear pores and also don’t show any particular enrichment in the nucleolus. They do, however, detect Mex67 in the (not-phase-separated) area of the nucleoplasm. We have changed the text to make this clearer. The Mex67 antiserum works well on a western blot (see for example: Pozzi, B., Naguleswaran, A., Florini, F., Rezaei, Z. & Roditi, I. The RNA export factor TbMex67 connects transcription and RNA export in Trypanosoma brucei and sets boundaries for RNA polymerase I. Nucleic Acids Res. 51, 5177–5192 (2023))

      Line 477: "lacked" should be "lagged".

      Thank you, corrected.

      Line 468-481: My previous argument holds here - how do you know that the difference in detection here is just a matter of much higher affinity/quantity of binding partner for the avidin?

      See answer to the second point of (3), above.

      483-491: Same issue - without certainty about what the biotin is on, this argument is difficult to make. 

      See answer to the second point of (3), above.

      Line 530: "bone-fine" should be "bonafide"

      Thank you, corrected.

      Line 602: biotin/streptavidin labeling has been used for expansion microscopy previously (Sun, Nature Biotech 2021; PMID: 33288959). 

      Thank you, we had overlooked this! We have now included this reference and describe the differences to our approach clearer in the discussion part:

      “Fluorescent streptavidin has been previously used in expansion microscopy to detect biotin residues in target proteins produced by click chemistry (Sun et al., 2021). However, to the best of our knowledge, this is the first report that employs fluorescent streptavidin as a signal enhancer in expansion microscopy and CLEM, by combining it with multiple biotinylation sites added by a biotin ligase. Importantly, for both CLEM and expansion, streptavidin imaging is the only alternative approach to immunofluorescence, as denaturing conditions associated with these methods rule out direct imaging of fluorescent tags.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment:

      This study presents valuable framework and findings to our understanding of the brain as a fractal object by observing the stability of its shape property within 11 primate species and by highlighting an application to the effects of aging on the human brain. The evidence provided is solid but the link between brain shape and the underlying anatomy remains unclear. This study will be of interest to neuroscientists interested in brain morphology, whether from an evolutionary, fundamental or pathological point of view, and to physicists and mathematicians interested in modeling the shapes of complex objects.

      We now clarified the outstanding questions regarding if our model outputs can be related to actual primate brain anatomy, which we believe was mainly based on comments regarding the validity of our output of apparently thicker cortices than nature can produce.

      We address this point in more detail in the point-by-point response below, but want to address this misunderstanding directly here: Our algorithm does not produce thicker cortices with increasing coarse-graining scales; in fact, the cortical thickness never exceeds the actual cortical thickness in our outputs, but rather thins with each coarse-graining scale. In other words, we believe that our outputs are fully in line with neuroanatomy across species.

      Reviewer #2 (Public Review): 

      In this manuscript, the authors analyze the shapes of cerebral cortices from several primate species, including subgroups of young and old humans, to characterize commonalities in patterns of gyrification, cortical thickness, and cortical surface area. The authors state that the observed scaling law shares properties with fractals, where shape properties are similar across several spatial scales. One way the authors assess this is to perform a "cortical melting" operation that they have devised on surface models obtained from several primate species. The authors also explore differences in shape properties between brains of young (~20 year old) and old (~80) humans. A challenge the authors acknowledge struggling with in reviewing the manuscript is merging "complex mathematical concepts and a perplexing biological phenomenon." This reviewer remains a bit skeptical about whether the complexity of the mathematical concepts being drawn from are justified by the advances made in our ability to infer new things about the shape of the cerebral cortex. 

      To allow scientists from all backgrounds to adopt these complex ideas, we have made our code to “melt” the brains and for further downstream analysis publicly available. We have now also provided a graphical user interface, to allow users without substantial coding experience to run the analysis. We also believe that the algorithmic concepts are easy to understand due to the similarity to the coarse-graining procedures found in long-standing and well-accepted box-counting algorithms.

      Beyond the theoretical insight of the fractal nature of cortices and providing an explicit and crucial link between vastly different brains that are gyrified and those that are not, we believe that the advance gained by our methods for future applications is clearly demonstrated in our proof-of-principle with a four-fold increase in effect size. For reference, an effect size of 8 would translate to an almost perfect separation of groups, i.e. an ideal biomarker with near 100% sensitivity and specificity.

      (1) The series of operations to coarse-grain the cortex illustrated in Figure 1 produces image segmentations that do not resemble real brains.

      As re-iterated in our Methods and Discussion: “Note, of course, that the coarse-grained brain surfaces are an output of our algorithm alone and are not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      Fig. 1 therefore serves as an explanation to the reader on the algorithmic outputs, but each melted brain is not supposed to be directly/visually compared to actual brains. Similar to algorithms measuring the fractal dimension, or the exposed surface area of a given brain, the intermediate outputs of these algorithms are not supposed to represent any biologically observed brain structures, but rather serve as an abstraction to obtain meaningful morphometrics.

      We additionally added a note to the caption of Fig. 1 to clarify this point:

      “Note that the actual size of the brains for analysis are rescaled (see Methods and Fig. 3); we display all brains scaled at an equal size here for the ease of visualisation of the method.”

      Finally, we also edited the entire paper for terminology to clearly distinguish the terms of (1) the cortex as a 3D object, (2) coarse-grained and voxelised versions thereof, and (3) summary morphological measures derived from the former. When we invite comparisons in our paper between real brains and coarse-grained brains, this is always at the level of summary morphological measures, not at the level of the 3D objects/voxelisations themselves.

      The process to assign voxels in downsampled images to cortex and white matter is biased towards the former, as only 4 corners of a given voxel are needed to intersect the original pial surface, but all 8 corners are needed to be assigned a white matter voxel. The reason for introducing this bias (and to the extent that it is present in the authors' implementation) is not provided.

      This detail was in the Supplementary, and we have now added additional clarification on this specific point to our Supplementary:

      “In detail, we assign all voxels in the grid with at least four corners inside the original pial surface to the pial voxelization. This process allows the exposed surface to remain approximately constant with increasing voxel sizes. A constant exposed surface is desirable, as we only want to gradually ‘melt’ and fuse the gyri, but not grow the bounding/exposed surface as well. We want the extrinsic area to remain approximately constant as we decrease the intrinsic area via coarse-graining; it is like generating iterates of a Koch curve in reverse, from more to less detailed, by increasing the length of smallest line segment.

      We then assign voxels with all eight corners inside the original white matter surface to the white matter voxelization. This is to ensure integrity of the white matter, as otherwise white matter voxels in gyri may become detached from the core white matter, and thus artificially increase white matter surface area. Indeed, the main results of the paper are not very sensitive to this decision using all eight corners, vs. e.g. only four corners, as we do not directly use white matter surface area for the scaling law measurements. However, we still maintained this choice in case future work wants to make use of the white matter voxelisations or derivative measures.”

      Note on the point of white matter integrity that if both grey and white matter voxelisations require all 8 corner to be inside the respective mesh, there will be voxels not assigned to either at the grey/white matter interface, causing potential downstream issues.

      We further acknowledge:

      “Of course, our proposed procedure is not the only conceivable way to erase shape details below a given scale; and we are actively working on related algorithms that are also computationally cheaper. Nevertheless, the current version requires no fine-tuning, is computationally feasible and conceptually simple, thus making it a natural choice for introducing the methodology and approach.”

      The authors provide an intuitive explanation of why thickness relates to folding characteristics, but ultimately an issue for this reviewer is, e.g., for the right-most panel in Figure 2b, the cortex consists of several 4.9-sided voxels and thus a >2 cm thick cortex. A structure with these morphological properties is not consistent with the anatomical organization of typical mammalian neocortex. 

      We assume the reviewer refers to Fig. 1B with the panel on scale=4.9mm. We would like to point out that Fig. 1 serves as an explanation of the voxelisation method. For the actual analysis and Results, we are using re-scaled brains (see Fig. 2 with the ever decreasing brain sizes). The rescaling procedure is now expanded as below:

      “Morphological properties, such as cortical thicknesses measured in our ‘melted’ brains are to be understood as a thickness relative to the size of the brain. Therefore, to analyse the scaling behaviour of the different coarse-grained realisations of the same brain, we apply an isometric rescaling process that leaves all dimensionless shape properties unaffected (more details in Suppl. S3.1). Conceptually, this process fixes the voxel size, and instead resizes the surfaces relative to the voxel size, which ensures that we can compare the coarse-grained realisations to the original cortices, and test if the former, like the latter, also scale according to Eqn. (1). Resizing, or more precisely, shrinking the cortical surface is mathematically equivalent to increasing the box size in our coarse-graining method. Both achieved an erasure of folding details below a certain threshold. After rescaling, as an example, the cortical thickness also shrinks with increasing levels of coarse-graining, and never exceeds the thickness measured at native scale.”

      We additionally added a note to the caption of Fig. 1 to clarify this point:

      “Note that the actual size of the brains for analysis are rescaled (see Methods and Fig. 3); we display all brains scaled at an equal size here for the ease of visualisation of the method.”

      Finally, we also edited the entire paper for terminology to clearly distinguish the terms of (1) the cortex as a 3D object, (2) coarse-grained versions thereof, and (3) summary morphological measures derived from the former. When we invite comparisons in our paper between real brains and coarse-grained brains, this is always at the level of summary morphological measures, not at the level of the 3D objects themselves and their detailed anatomical features.

      (2) For the comparison between 20-year-old and 80-year-old brains, a well-documented difference is that the older age group possesses more cerebral spinal fluid due to tissue atrophy, and the distances between the walls of gyri becomes greater. This difference is born out in the left column of Figure 4b. It seems this additional spacing between gyri in 80 year olds requires more extensive down-sampling (larger scale values in Figure 4a) to achieve a similar shape parameter K as for the 20 year olds. The authors assert that K provides a more sensitive measure (associated with a large effect size) than currently used ones for distinguishing brains of young vs. old people. A more explicit, or elaborate, interpretation of the numbers produced in this manuscript, in terms of brain shape, might make this analysis more appealing to researchers in the aging field.

      We have removed the main results relating to K and aging from our last revision already to avoid confusion. This is now only in the supplementary analysis, and our claim of K being a more sensitive measure for age and ageing – whilst still true – will be presented in more detail in a series of upcoming papers.

      (3) In the Discussion, it is stated that self-similarity, operating on all length scales, should be used as a test for existing and future models of gyrification mechanisms. Given the lack of association between the abstract mathematical parameters described in this study and explicit properties of brain tissue and its constituents, it is difficult to envision how the coarse-graining operation can be used to guide development of "models of cortical gyrification."

      We have clarified in more detail what we meant originally in Discussion:

      “Finally, this dual universality is also a more stringent test for existing and future models of cortical gyrification mechanisms at relevant scales, and one that moreover is applicable to individual cortices. For example, any models that explicitly simulate a cortical surface as an output could be directly coarse-grained with our method and the morphological trajectories can be compared with those of actual human and primate cortices. The simulated cortices would only be ‘valid’ in terms of the dual universality, if it also produces the same morphological trajectories.”

      However, we agree with the reviewer that our paper could be misread as demanding direct comparisons of each coarse-grained brain with an actual brain, and we have now added the following text to clarify that this is not our intention for the proposed method or outputs.

      “Note, we do not suggest to directly compare coarse-grained brain surfaces with actual biological brain surfaces. As we noted earlier, the coarse-grained brain surfaces are an output of our algorithm alone and not to be directly/naively likened to actual brain surfaces, e.g. in terms of the location or shape of the folds. Our comparisons here between coarse-grained brains and actual brains is purely on the level of morphometrics across the whole cortex.”

      Indeed, the dual universality imposes restrictive constraints on the possible shapes of real cortices, but do not fully specify them. Presumably, the location of individual folds in different individuals and species will depend on their respective evolutionary histories, so there is no reason to expect a match in fold location between the ‘melted’ cortices of more gyrified species, on one hand, and the cortex of a less-gyrified one, on the other,  even if their global morphological parameters and global mechanism of folding coincide.

      (4) There are several who advocate for analyzing cortical mid-thickness surfaces, as the pial surface over-represents gyral tips compared to the bottoms of sulci in the surface area. The authors indicate that analyses of mid-thickness representations will be taken on in future work, but this seems to be a relevant control for accepting the conclusions of this manuscript.

      In the context of some applications and methods, we agree that the mid-surface is a meaningful surface to analyse. However, in our work, the mid-surface is not. The fractal estimation rests on the assumption that the exposed area hugs the object of interest (hence convex hull of the pial surface), as the relationship between the extrinsic and intrinsic areas across scales determine the fractal relationship (Eq. 2). If we used the mid-surface instead of the pial surface for all estimation, this would not represent the actual object of interest, and it is separated from the convex hull. Estimating a new convex hull based on the mid surface would be the equivalent of asking for the fractal dimension of the mid-surface, not of the cortical ribbon. In other words, it would be a different question, bound to yield a different answer.

      Hence, we indicated in our original response that we only have a provisional answer, but more work beyond the scope of this paper is required to answer this question, as it is a separate question. The mid-surface, as a morphological structure in its own right, will have its own scaling properties, and our provisional understanding is that these also yield a scaling law parallel to those of the cortical ribbon with the same or a similar fractal dimension. But more systematic work is required to investigate this question at native scale and across scales.

      Reviewer #3 (Public Review):

      Summary: Through a rigorous methodology, the authors demonstrated that within 11 different primates, the shape of the brain followed a universal scaling law with fractal properties. They enhanced the universality of this result by showing the concordance of their results with a previous study investigating 70 mammalian brains, and the discordance of their results with other folded objects that are not brains. They incidentally illustrated potential applications of this fractal property of the brain by observing a scale-dependant effect of aging on the human brain. 

      Strengths: 

      - New hierarchical way of expressing cortical shapes at different scales derived from previous report through implementation of a coarse-graining procedure 

      - Investigation of 11 primate brains and contextualisation with other mammals based on prior literature 

      - Proposition of tool to analyse cortical morphology requiring no fine tuning and computationally achievable 

      - Positioning of results in comparison to previous works reinforcing the validity of the observation. 

      - Illustration of scale-dependance of effects of brain aging in the human. 

      Weaknesses: 

      - The notion of cortical shape, while being central to the article, is not really defined, leaving some interpretation to the reader 

      - The organization of the manuscript is unconventional, leading to mixed contents in different sections (sections mixing introduction and method, methods and results, results and discussion...). As a result, the reader discovers the content of the article along the way, it is not obvious at what stages the methods are introduced, and the results are sometimes presented and argued in the same section, hindering objectivity. 

      To improve the document, I would suggest a modification and restructuring of the article such that: 1) by the end of the introduction the reader understands clearly what question is addressed and the value it holds for the community, 2) by the end of the methods the reader understands clearly all the tools that will be used to answer that question (not just the new method), 3) by the end of the results the reader holds the objective results obtained by applying these tools on the available data (without subjective interpretations and justifications), and 4) by the end of the discussion the reader understands the interpretation and contextualisation of the study, and clearly grasps the potential of the method depicted for the better understanding of brain folding mechanisms and properties. 

      We thank this reviewer again for their attention to detail and constructive comments. We have followed the detailed suggestions provided by us in the Recommendations For The Authors, and summarise the main changes here:

      - We have restructured all sections to be more clearly following Introduction, Methods, Results, and Discussion; by using subsections, we believe the structure is now more accessible to readers.

      -  We have now clarified the concept of “cortical shape”, as we use it in our paper in several places, by distinguishing clearly the object of study, and the morphological properties measured from it.

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors): None 

      Reviewer #3 (Recommendations For The Authors): 

      I once again compliment the authors for their elegant work. I am happy with the way they covered my first feedback. My second review takes into account some comments made by other reviewers with which I agree. 

      We thank this reviewer again for their attention to detail and constructive comments.

      Recommendations for clarifications: 

      General comments: The purpose of the article could be made clearer in the introduction. When I differentiate results from discussion, I think of results as objective measures or observations, while discussion will relate to the interpretation of these results (including comparison with previous literature, in most cases). 

      We have restructured all sections to be more clearly following Introduction, Methods, Results, and Discussion; by using subsection, we believe the structure is now more accessible to readers.

      - l.39: define or discuss "cortical shape" 

      We have gone through the entire paper and corrected for any ambiguities. We specifically distinguish between the cortex as a structure overall, shape measures derived from this structure, and coarse-grained versions of the structure.

      - l.48-74: this would match either an introduction or a discussion rather than a methods section. 

      Done

      - l.98-106: this would match a discussion rather than a methods section. 

      Done

      - l.111: here could be a good spot to discuss the 4 vs 8 corners for inclusion of pial vs white matter voxelization 

      We have discussed this in the more detailed Supplementary section now, as after restructuring, this appears to be the more suitable place.

      - l.140-180: it feels that this section mixes methods, results and discussion of the results 

      We agree and we have resolved this by removing sentences and re-arranging sections.

      - l.183-217: mix of results and discussion 

      We agree and we have resolved this by removing sentences and re-arranging sections.

      Small cosmetic suggestions: 

      - l.44: conservation of 'some' quantities: vague 

      Changed to conservation of morphological relationships across evolution

      - l.66: order of citations ([24, 22,23]) 

      Will be fixed at proof stage depending on format of references.

      - l.77: delete space between citation and period 

      Done

      - l.77: I would delete 'say' 

      Done

      - l.86: 'but to also analyse' -> 'to analyse' 

      Done

      - l.105: remove 'we are encouraged that' 

      Done

      - l.111: 'also see' -> 'see also' 

      Done

      - l.164: 'remarkable': subjective 

      Done

      - l.189: define approx. abbreviation 

      Done

      - l.190: 'approx' -> 'approx.' 

      Revised

      - l.195: 'dramatic': subjective 

      removed

      -l. 246: 'much' -> vague 

      explained

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors report compound heterozygous deleterious variants in the kinase domains of the non-receptor tyrosine kinases (NRTK) TNK2/ACK1 in familial SLE. They suggest that ACK1 and BRK deficiencies are associated with human SLE and impair efferocytosis.

      Strengths: 

      The identification of similar mutations in non-receptor tyrosine kinases (NRTKs) in two different families with familial SLE is a significant finding in human disease. Furthermore, the paper provides a detailed analysis of the molecular mechanisms behind the impairment of efferocytosis caused by mutations in ACK1 and BRK.

      Weaknesses: 

      A critical point in this paper is whether the loss of function of ACK1 or BRK contributes to the onset of familial SLE. The authors emphasize that inhibitors of ACK1/BRK worsened IgG deposition in the kidneys in a pristane-induced SLE model, which contributes not to the onset but to the exacerbation of SLE, thus only partially supporting their claim.

      The evidence supporting that the loss of function of ACK1 or BRK contributes to the onset of SLE in the patients from the 2 families mostly relies on the genetic analysis. As the reviewer states, the observation that inhibitors of ACK1/BRK worsened IgG deposition in the kidneys in a pristane-induced SLE model supports the genetic evidence.

      To further address the possible role of ACK1 or BRK variants in the onset of autoimmunity in vivo, we treated wild-type (WT) BALB/cByJ female mice with inhibitors in the absence of pristane.

      The results indicated that mice that had received a weekly injection of ACK1 or BRK inhibitors developed a large array of serum anti-nuclear IgG antibodies, including but not limited to autoantibodies associated with SLE such as anti-histones, anti-chromatin, anti U1-snRNP, anti-SSA, and anti-Ku in comparison to the control group inhibitor treated mice (Revised Fig 3A). However, they did not develop glomerular deposit of IgG after 12 weeks of treatment, in contrast to mice that have received Pristane (Revised Fig. 3B,C, Figure 3-figure supplement 1).

      These additional data suggests that inhibition of ACK1 and BRK stimulates the production of serum autoantibodies, which strengthen the claim that ACK1 and BRK kinase deficiency contribute to autoimmunity in BALB/cByJ.

      Reviewer #2 (Public Review):

      Summary: 

      In this manuscript, the authors revealed that genetic deficiencies of ACK1 and BRK are associated with human SLE. First, the authors found that compound heterozygous deleterious variants in the kinase domains of the non-receptor tyrosine kinases (NRTK) TNK2/ACK1 in one multiplex family and PTK6/BRK in another family. Then, by an experimental blockade of ACK1 or BRK in a mouse SLE model, they found an increase in glomerular IgG deposits and circulating autoantibodies. Furthermore, they reported that ACK and BRK variants from the SLE patients impaired the MERTK-mediated anti-inflammatory response to apoptotic cells in human induced pluripotent stem cells (hiPSC)-derived macrophages. This work identified new SLE-associated ACK and BRK variants and a role for the NRTK TNK2/ACK1 and PTK6/BRK in efferocytosis, providing a new molecular and cellular mechanism of SLE pathogenesis.

      Strengths: 

      This work identified new SLE-associated ACK and BRK variants and a role for the NRTK TNK2/ACK1 and PTK6/BRK in efferocytosis, providing a new molecular and cellular mechanism of SLE pathogenesis.

      Weaknesses: 

      Although the manuscript is well-organized and clearly stated, there are some points below that should be considered:

      In this study, the authors used forward genetic analyses to identify novel gene mutations that may cause SLE, combined with GWAS studies of SLE. To further explore the importance of these variants, haplotype analysis of two candidate genes could be performed, to observe the evolution and selection relationship of candidate genes in the population (UK 1000 biobank, for example). 

      To investigate whether ACK1/TNK2 or BRK/PTK6 were subject to selection, we gathered data using different metrics quantifying negative selection in the human genome. We collected the f parameter from SnIPRE1, lofTool2, and evoTol3, as well as intraspecies metrics from RVIS4, LOEUF5, and pLI6 (including pRec). We also used our in-house CoNeS metric7. None of these indicators suggest that the genes are under strong negative selection (Revised Figure 2-figure supplement 2). This is consistent with the deficiency being recessive. We also tested the variants with a MAF greater than 0.005. We found them to be neutral. We therefore did not test whether they were associated with any phenotype in the UK Biobank.

      Although the authors focused on SLE and macrophage efferocytosis in their studies, direct evidence of how macrophage efferocytosis significantly affects SLE is lacking. This point should at least be explicitly introduced and discussed by citing appropriate literature.

      We provide a more detailed description of the role of macrophage efferocytosis in autoimmunity and SLE in the revised manuscript. Specifically, we state (in the results section, paragraph: ACK1 and BRK kinase domain variants may lose the ability to link MERTK to RAC1, AKT and STAT3 activation for efferocytosis): “NRTKs such as ACK1 8 and PTK2/FAK 9 are also downstream targets of the TAM family receptor MERTK which is expressed on macrophages and controls the anti-inflammatory engulfment of apoptotic cells, a process known as efferocytosis 10-12. Efferocytosis allows for the clearance of apoptotic cells before they undergo necrosis and release intracellular inflammatory molecules, and simultaneously leads to increased production of anti-inflammatory molecules (TGFb, IL-10, and PGE2) and a decreased secretion of proinflammatory cytokines (TNF-alpha, IL-1b, IL-6) 10-14. In line with these findings, mice deficient in molecular components used by macrophages to efficiently perform efferocytosis, such as MFG-E8, MERTK, TIM4, and C1q, develop phenotypes associated with autoimmunity10,11,14-27. Furthermore, defects in efferocytosis are also observed in patients with SLE and glomerulonephritis14,28-31.“

      It is still not clear how the target molecules identified in this paper may influence macrophage efferocytosis. More direct evidence should be established. 

      Our studies show that wt -but not variants- of ACK1 and BRK are activated by MERTK, a key receptor that mediates the recognition of apoptotic cells. Our studies also show that wt -but not variants- activate RAC1 which is necessary for engulfment and phosphorylate AKT and STAT3 which are involved in the anti-inflammatory response to PtdSer recognition.

      The TAM family receptor MERTK mediates recognition of PtdSer on apoptotic cells via GAS6 and Protein S 10,15,32 leading to their engulfment, which involves activation of RAC1 for actin reorganization and the formation of a phagocytic cup 9,33. Using IP kinase assays we show that MERTK and GAS6 can activate the kinase activity of wild-type ACK1 8 or BRK but not of the patient’s ACK1 or BRK variant alleles (Figure 4D). To further support the role of ACK1 and BRK downstream from PtdSer recognition and uptake of apoptotic cells, we show that reference ACK1 and BRK alleles, in contrast to the patient variant alleles, can activate RAC1 to generate RAC-GTP which is necessary for engulfment 9,33 (Figure 4C).

      PtdSer recognition also typically stimulates an anti-inflammatory process mediated in part via AKT 34 and STAT3 and their target genes such as SOCS3 35-41 and results in the inhibition of LPS-mediated production of inflammatory mediators such as TNF and IL-1b, and the production of cytokines such as IL-10, TGFb 11,25-27,42. Consistent with this literature and the findings of the paper, we show that reference ACK1 and BRK, unlike the patient’s variant alleles, can phosphorylate AKT and STAT3 (Figure 4A, B). The role of ACK1 and BRK in these signaling pathways is further supported by our transcriptomics data comparing the response of controls, patients, and inhibitor-treated iPSC-derived macrophages to apoptotic thymocytes by RNA-seq. Specifically, we show Transcriptional repressors including the AKT targets ATF3, TGIF1, NFIL3, and KLF4, the STAT3 targets SOCS3 and DUSP5, as well as CEBPD and the inhibitor of E-BOX DNA Binding ID3 were among the top-ten genes which expression is induced by apoptotic cells in WT macrophages (Figure 4F), but this regulation was lost in mutant and inhibitor-treated macrophages (Figure 4F).

      For some transcriptional repressors mentioned in their studies, the authors should check whether there is clear experimental evidence. If not, it is recommended to supplement the experimental verifications for clarity.

      Transcriptional repressors including the AKT targets ATF3, TGIF1, NFIL3, and KLF4, the STAT3 targets SOCS3 and DUSP5, as well as CEBPD and the inhibitor of E-BOX DNA Binding ID3 were among the top-ten genes which expression is induced by apoptotic cells in WT macrophages (Figure 4F), but this regulation was lost in mutant and inhibitor-treated macrophages (Figure 4F).

      In the manuscript we cited published evidence, to the best of our knowledge, for the role of these genes in the regulation of inflammatory responses. Specifically we state: “ATF3, TGIF1, NFIL3, and KLF4 are involved in the negative regulation of inflammation in macrophages 35-38, SOCS3 is an inhibitor of the macrophage inflammatory response and DUSP5 is a negative regulator of ERK activation 39,40,43. These data suggest that the kinase domain of ACK1 and BRK contribute to the macrophage anti-inflammatory gene expression program driven by apoptotic cells.”

      In Figures 4C and 4D, it is seen that the usage of inhibitors causes cytoskeletal changes, however this reviewer would not have expected such large change. Did the authors check whether the cells die after heavy treatment by the inhibitors?

      We carefully examine the viability of Isogenic WT, BRK and ACK1 mutant macrophages (left panel) and of WT macrophages treated with ACK1 or BRK inhibitors and we did not observed changes in viability (Figure 4-figure supplement 2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A crucial step in the development of SLE is the production of autoantibodies. It is shown in Figure 2F that inhibitors of ACK1/BRK enhanced the production of autoantibodies against histones and SSA in a pristane-induced SLE model, which is a significant result that could support the authors' claim. Strangely, this autoantigen panel does not include double-stranded DNA, RNP, or Sm, which should be presented regarding antibody production.

      We thank the reviewer for this comment. In the revised manuscript (Revised Figure 3 – Supplement 1) we added the remainder of the autoantibody panel, which includes double-stranded DNA, RNP, and Sm autoantibody levels. We also added the results for serum IgG autoantibody levels in BALB/cByJ mice treated for three months with DMSO, ACK1, or BRK inhibitors but did not receive a pristane injection (Revised Figure 3A). This data shows that mice which received ACK1 or BRK inhibitors had increased serum IgG autoantibodies in comparison to DMSO treated controls.

      Additionally, if there is information that inhibitors of ACK1/BRK promote the differentiation of follicular helper T cells, memory B cells, and plasma cells in a pristane-induced SLE model, it could be considered indirect evidence supporting the authors' claims.

      These are not available at present to the best of our knowledge.

      Reviewer #2 (Recommendations For The Authors):

      Minor points:

      * In the literature, unpaired t-tests and ordinary one-way ANOVA (Tukey's multiple comparisons test) were used for statistical analysis, which requires data to be normally distributed. This part of the proposal is reflected in the text, and the non-conforming results need to be statistically analyzed using the non-parametric test of graphpad prism.

      We would like to thank the reviewer for pointing out this oversight. In the revised manuscript, for all applicable datasets, we tested whether the data was normally distributed using a Shapiro-Wilk normality test. For datasets that were normally distributed statistical significance was determined by a Student t test or ordinary one-way ANOVA with Tukey’s multiple comparisons test depending on the number of conditions being compared and the experimental setup. In contrast, for datasets that were not normally distributed statistical significance was determined using a Mann-Whitney, Kruskal-Wallis multiple comparisons tests, or Wilcoxon matched-pairs signed rank test depending on the experimental setup. P values below 0.05 were considered significant for all statistical tests.

      The authors used different methods to represent the level of significant difference. Therefore, it is suggested that the significance level should be expressed by letters. 

      As suggested by the reviewer, in the revised manuscript we have designated the significance level throughout all figures using letters (p, or q values).

      For RNA-seq, more information should be provided in the paper. For example, the correlation between sample biological replicates, the total number of differentially expressed genes, and randomly selected genes for qRT-PCR results verification.

      We would like to thank the reviewer for pointing out this oversight. In the revised manuscript we provided more information regarding the RNA-seq dataset, including a Principal Component Analysis (PCA) showing correlation between sample replicates (Revised Figure 4-figure supplement 1A), as well as a table indicating the number of upregulated and downregulated genes between relevant datasets (Revised Figure 4-figure supplement 1B).

      The results of the RNA-seq analysis indicated that ACK1 and BRK contribute to the macrophage anti-inflammatory gene expression program driven by apoptotic cells. MERTK-dependent anti-inflammatory program elicited by apoptotic cells on macrophages is best evidenced by the reduction of LPS-mediated production of inflammatory mediators such as TNF or IL1b 25-27,34,44. Therefore, to validate the RNA-seq results in a functional manner we tested the decrease of LPS-induced production of TNF and IL1b by apoptotic cells in isogenic WT, ACK1 deficient, and BRK deficient macrophages. Consistent with the RNA-seq data, the functional assays indicated that ACK1 and BRK kinase activities are required for the decrease of TNF and IL1b production induced by LPS in response to apoptotic cells (Revised Figure 4H,I).

      The raw data files for the RNA-seq analysis have been deposited in the NCBI Gene Expression Omnibus under accession number GEO: GSE118730.

      The authors did not have the formats for some of the citations correct. This should be fixed. 

      References were reformatted.

      (1) Eilertson, K. E., Booth, J. G. & Bustamante, C. D. SnIPRE: selection inference using a Poisson random effects model. PLoS Comput Biol 8, e1002806 (2012). https://doi.org/10.1371/journal.pcbi.1002806

      (2) Fadista, J., Oskolkov, N., Hansson, O. & Groop, L. LoFtool: a gene intolerance score based on loss-of-function variants in 60 706 individuals. Bioinformatics 33, 471-474 (2017). https://doi.org/10.1093/bioinformatics/btv602

      (3) Rackham, O. J., Shihab, H. A., Johnson, M. R. & Petretto, E. EvoTol: a protein-sequence based evolutionary intolerance framework for disease-gene prioritization. Nucleic Acids Res 43, e33 (2015). https://doi.org/10.1093/nar/gku1322

      (4) Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet 9, e1003709 (2013). https://doi.org/10.1371/journal.pgen.1003709

      (5) Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434-443 (2020). https://doi.org/10.1038/s41586-020-2308-7

      (6) Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285-291 (2016). https://doi.org/10.1038/nature19057

      (7) Rapaport, F. et al. Negative selection on human genes underlying inborn errors depends on disease outcome and both the mode and mechanism of inheritance. Proc Natl Acad Sci U S A 118 (2021). https://doi.org/10.1073/pnas.2001248118

      (8) Mahajan, N. P., Whang, Y. E., Mohler, J. L. & Earp, H. S. Activated tyrosine kinase Ack1 promotes prostate tumorigenesis: role of Ack1 in polyubiquitination of tumor suppressor Wwox. Cancer Res 65, 10514-10523 (2005). https://doi.org/10.1158/0008-5472.CAN-05-1127

      (9) Wu, Y., Singh, S., Georgescu, M. M. & Birge, R. B. A role for Mer tyrosine kinase in alphavbeta5 integrin-mediated phagocytosis of apoptotic cells. J Cell Sci 118, 539-553 (2005). https://doi.org/10.1242/jcs.01632

      (10) Scott, R. S. et al. Phagocytosis and clearance of apoptotic cells is mediated by MER. Nature 411, 207-211 (2001). https://doi.org/10.1038/35075603

      (11) Henson, P. M. & Bratton, D. L. Antiinflammatory effects of apoptotic cells. J Clin Invest 123, 2773-2774 (2013). https://doi.org/10.1172/JCI69344

      (12) Henson, P. M. Cell Removal: Efferocytosis. Annu Rev Cell Dev Biol 33, 127-144 (2017). https://doi.org/10.1146/annurev-cellbio-111315-125315

      (13) deCathelineau, A. M. & Henson, P. M. The final step in programmed cell death: phagocytes carry apoptotic cells to the grave. Essays Biochem 39, 105-117 (2003). https://doi.org/10.1042/bse0390105

      (14) Nagata, S. Apoptosis and Clearance of Apoptotic Cells. Annu Rev Immunol 36, 489-517 (2018). https://doi.org/10.1146/annurev-immunol-042617-053010

      (15) Cohen, P. L. et al. Delayed apoptotic cell clearance and lupus-like autoimmunity in mice lacking the c-mer membrane tyrosine kinase. J Exp Med 196, 135-140 (2002). https://doi.org/10.1084/jem.20012094

      (16) Hanayama, R. et al. Autoimmune disease and impaired uptake of apoptotic cells in MFG-E8-deficient mice. Science 304, 1147-1150 (2004). https://doi.org/10.1126/science.1094359

      (17) Miyanishi, M., Segawa, K. & Nagata, S. Synergistic effect of Tim4 and MFG-E8 null mutations on the development of autoimmunity. Int Immunol 24, 551-559 (2012). https://doi.org/10.1093/intimm/dxs064

      (18) Colonna, L., Parry, G. C., Panicker, S. & Elkon, K. B. Uncoupling complement C1s activation from C1q binding in apoptotic cell phagocytosis and immunosuppressive capacity. Clin Immunol 163, 84-90 (2016). https://doi.org/10.1016/j.clim.2015.12.017

      (19) Nagata, S., Hanayama, R. & Kawane, K. Autoimmunity and the clearance of dead cells. Cell 140, 619-630 (2010). https://doi.org/10.1016/j.cell.2010.02.014

      (20) Kimani, S. G. et al. Contribution of Defective PS Recognition and Efferocytosis to Chronic Inflammation and Autoimmunity. Front Immunol 5, 566 (2014). https://doi.org/10.3389/fimmu.2014.00566

      (21) Hanayama, R., Tanaka, M., Miwa, K., Shinohara, A., Iwamatsu, A. & Nagata, S. Identification of a factor that links apoptotic cells to phagocytes. Nature 417, 182-187 (2002). https://doi.org/10.1038/417182a

      (22) Kawano, M. & Nagata, S. Lupus-like autoimmune disease caused by a lack of Xkr8, a caspase-dependent phospholipid scramblase. Proc Natl Acad Sci U S A 115, 2132-2137 (2018). https://doi.org/10.1073/pnas.1720732115

      (23) Watanabe-Fukunaga, R., Brannan, C. I., Copeland, N. G., Jenkins, N. A. & Nagata, S. Lymphoproliferation disorder in mice explained by defects in Fas antigen that mediates apoptosis. Nature 356, 314-317 (1992). https://doi.org/10.1038/356314a0

      (24) Singer, G. G., Carrera, A. C., Marshak-Rothstein, A., Martinez, C. & Abbas, A. K. Apoptosis, Fas and systemic autoimmunity: the MRL-lpr/lpr model. Current opinion in immunology 6, 913-920 (1994).

      (25) Cvetanovic, M. & Ucker, D. S. Innate immune discrimination of apoptotic cells: repression of proinflammatory macrophage transcription is coupled directly to specific recognition. J Immunol 172, 880-889 (2004). https://doi.org/10.4049/jimmunol.172.2.880

      (26) Fadok, V. A., Bratton, D. L., Konowal, A., Freed, P. W., Westcott, J. Y. & Henson, P. M. Macrophages that have ingested apoptotic cells in vitro inhibit proinflammatory cytokine production through autocrine/paracrine mechanisms involving TGF-beta, PGE2, and PAF. J Clin Invest 101, 890-898 (1998). https://doi.org/10.1172/JCI1112

      (27) Voll, R. E., Herrmann, M., Roth, E. A., Stach, C., Kalden, J. R. & Girkontaite, I. Immunosuppressive effects of apoptotic cells. Nature 390, 350-351 (1997). https://doi.org/10.1038/37022

      (28) Herrmann, M., Voll, R. E., Zoller, O. M., Hagenhofer, M., Ponner, B. B. & Kalden, J. R. Impaired phagocytosis of apoptotic cell material by monocyte-derived macrophages from patients with systemic lupus erythematosus. Arthritis Rheum 41, 1241-1250 (1998). https://doi.org/10.1002/1529-0131(199807)41:7<1241::AID-ART15>3.0.CO;2-H

      (29) Baumann, I. et al. Impaired uptake of apoptotic cells into tingible body macrophages in germinal centers of patients with systemic lupus erythematosus. Arthritis Rheum 46, 191-201 (2002). [https://doi.org/10.1002/1529-0131(200201)46:1](https://doi.org/10.1002/1529-0131(200201)46:1<191::AID-ART10027>3.0.CO;2-K

      (30) Schrijvers, D. M., De Meyer, G. R. Y., Kockx, M. M., Herman, A. G. & Martinet, W. Phagocytosis of apoptotic cells by macrophages is impaired in atherosclerosis. Arterioscl Throm Vas 25, 1256-1261 (2005). https://doi.org/10.1161/01.ATV.0000166517.18801.a7

      (31) Morioka, S., Maueroder, C. & Ravichandran, K. S. Living on the Edge: Efferocytosis at the Interface of Homeostasis and Pathology. Immunity 50, 1149-1162 (2019). https://doi.org/10.1016/j.immuni.2019.04.018

      (32) Seitz, H. M., Camenisch, T. D., Lemke, G., Earp, H. S. & Matsushima, G. K. Macrophages and dendritic cells use different Axl/Mertk/Tyro3 receptors in clearance of apoptotic cells. J Immunol 178, 5635-5642 (2007). https://doi.org/10.4049/jimmunol.178.9.5635

      (33) Mao, Y. & Finnemann, S. C. Regulation of phagocytosis by Rho GTPases. Small GTPases 6, 89-99 (2015). https://doi.org/10.4161/21541248.2014.989785

      (34) Sen, P. et al. Apoptotic cells induce Mer tyrosine kinase-dependent blockade of NF-kappaB activation in dendritic cells. Blood 109, 653-660 (2007). https://doi.org/10.1182/blood-2006-04-017368

      (35) Vergadi, E., Ieronymaki, E., Lyroni, K., Vaporidi, K. & Tsatsanis, C. Akt Signaling Pathway in Macrophage Activation and M1/M2 Polarization. J Immunol 198, 1006-1014 (2017). https://doi.org/10.4049/jimmunol.1601515

      (36) Byles, V. et al. The TSC-mTOR pathway regulates macrophage polarization. Nat Commun 4, 2834 (2013). https://doi.org/10.1038/ncomms3834

      (37) Liao, X. et al. Kruppel-like factor 4 regulates macrophage polarization. J Clin Invest 121, 2736-2749 (2011). https://doi.org/10.1172/JCI45444

      (38) Roberts, A. W., Lee, B. L., Deguine, J., John, S., Shlomchik, M. J. & Barton, G. M. Tissue-Resident Macrophages Are Locally Programmed for Silent Clearance of Apoptotic Cells. Immunity 47, 913-927 e916 (2017). https://doi.org/10.1016/j.immuni.2017.10.006

      (39) Matsukawa, A. et al. Stat3 in resident macrophages as a repressor protein of inflammatory response. J Immunol 175, 3354-3359 (2005).

      (40) Sica, A. & Mantovani, A. Macrophage plasticity and polarization: in vivo veritas. J Clin Invest 122, 787-795 (2012). https://doi.org/10.1172/JCI59643

      (41) Yi, Z., Li, L., Matsushima, G. K., Earp, H. S., Wang, B. & Tisch, R. A novel role for c-Src and STAT3 in apoptotic cell-mediated MerTK-dependent immunoregulation of dendritic cells. Blood 114, 3191-3198 (2009). https://doi.org/10.1182/blood-2009-03-207522

      (42) Rothlin, C. V., Carrera-Silva, E. A., Bosurgi, L. & Ghosh, S. TAM receptor signaling in immune homeostasis. Annu Rev Immunol 33, 355-391 (2015). https://doi.org/10.1146/annurev-immunol-032414-112103

      (43) Seo, H. et al. Dual-specificity phosphatase 5 acts as an anti-inflammatory regulator by inhibiting the ERK and NF-kappaB signaling pathways. Sci Rep 7, 17348 (2017). https://doi.org/10.1038/s41598-017-17591-9

      (44) Camenisch, T. D., Koller, B. H., Earp, H. S. & Matsushima, G. K. A novel receptor tyrosine kinase, Mer, inhibits TNF-alpha production and lipopolysaccharide-induced endotoxic shock. J Immunol 162, 3498-3503 (1999).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Response to reviewer’s comments

      Reviewer #2 (Public Review):

      Summary: 

      The manuscript focuses on comparison of two PLP-dependent enzyme classes that perform amino acyl decarboxylations. The goal of the work is to understand the substrate specificity and factors that influence catalytic rate in an enzyme linked to theanine production in tea plants.

      Strengths: 

      The work includes x-ray crystal structures of modest resolution of the enzymes of interest. These structures provide the basis for design of mutagenesis experiments to test hypotheses about substrate specificity and the factors that control catalytic rate. These ideas are tested via mutagenesis and activity assays, in some cases both in vitro and in plants. 

      Weaknesses:

      Although improved in a revision, the manuscript could be more clear in explaining the contents of the x-ray structures and how the complexes studied relate to the reactant and product complexes. The manuscript could also be more concise, with a discussion section that is largely redundant with the results and lacking in providing scholarly context from the literature to help the reader understand how the current findings fit in with work to characterize other PLP-dependent enzymes or protein engineering efforts. Some of the figures lack sufficient clarity and description. Some of the claims about the health benefits of tea are not well supported by literature citations.

      Thank you for your insightful comments on our manuscript and your recognition of the strengths of our study. We understand your concerns about the weaknesses mentioned, and we have addressed them appropriately in the revised manuscript. We acknowledge that the discussion section needs to be improved for conciseness and context. We have revised this part by removing the redundant content. We also acknowledge your comments concerning the clarity and description of some figures. We have revisited these figures and revised them, ensuring they are clear and adequately described. Lastly, concerning the claims about the health benefits of tea, we understand your concern about the lack of supporting citations. We ensure to back such claims with valid literature or, if necessary, omit these statements.

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 21: Alanine Decarboxylase should not be capitalized.

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (2) Line 31: Grammatical error. Also not clear what "evolution analysis" means here. Revise to "Structural comparisons led us to..."

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (3) Line 34: Revise to "Combining a double mutant of CsAlaDC"

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (4) Line 35: Change word order to "increased theanine production 672%"

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (5) Line 37: meaning unclear. Revise to "provides a route to more efficient biosynthesis of theanine."

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (6) Line 44: I'm not sure that the "health effects" of tea have been proven in placebo controlled studies. And the references provided (2-4 and 5) do not describe original research articles supporting these claims. I would suggest removing these statements from the introduction and at later points in the manuscript.

      Thank you for your thoughtful feedback and suggestions. Based on your suggestion, we have removed these statements: "The popularity of tea is determined by its favorable flavor and numerous health benefits (2-4). The flavor and health-beneficial effects of tea are conferred by the abundant secondary metabolites, including catechins, caffeine, theanine, volatiles, etc (5). " As for the subsequent statement: " It has also many health-promoting functions, including neuroprotective effects, enhancement of immune functions, and potential anti-obesity capabilities, among others. " the referenced literature cited can substantiate this conclusion.

      (7) Line 58: insert "the" between provided and basis

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (8) Line 100: Not clear what this phrase means, "As expected, CsSerDC was closer to AtSerDC" Please clarify - closer to what?

      We apologize for any confusion caused by the unclear phrasing. When referring to "CsSerDC was closer to AtSerDC," we intended to convey that CsSerDC exhibits a higher degree of sequence homology with AtSerDC than it does with the other enzymes evaluated in our investigation. However, a 1.29% difference between 86.21% and 84.92% in amino acid similarity is not statistically significant (Figure 1B and Supplementary table 1 in the original manuscript), we have deleted the relevant descriptions in the revised manuscript.

      (9) Line 112: "were constructed into" makes no sense. It would be better to say the genes for the proteins of interest were inserted into the overexpression plasmid.

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (10) Line 115: missing the word "the" between generated and recombinant

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (11) Line 121: catalyze not catalyzed

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (12) Lines 129 and 130: The reported Km values are really large - in the mM range. Do these values make sense in terms of the available concentrations of the substrates inside the cell?

      The content of alanine in tea plant roots ranges from 0.28 to 4.18 mg/g DW (Yu et al., 2021; Cheng et al., 2017). Correspondingly, the physiological concentration of alanine is 3.14 mM to 46.92 mM, in tea plant roots. The content of serine in plants ranges from 0.014 to 17.6 mg/g DW (Kumar et al., 2017). Correspondingly, the physiological concentration of serine is 0.13 mM to 167.48 mM in plants. Therefore, in this study, the Km values are within the range of available substrate concentrations inside the cell.

      Yu, Y. et al. (2021) Glutamine synthetases play a vital role in high accumulation of theanine in tender shoots of albino tea germplasm "Huabai 1". J. Agric. Food Chem. 69 (46),13904-13915.

      Cheng, S. et al. (2017) Studies on the biochemical formation pathway of the amino acid L-theanine in tea (Camellia sinensis) and other plants. J. Agric. Food Chem. 65 (33), 7210-7216.

      Kumar, V. et al. (2017) Differential distribution of amino acids in plants. Amino Acids. 49(5), 821-869.

      (13) Line 211: it is unclear what the phrase "as opposed to wild-type" means. Please clarify.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We intend to communicate that the wild-type CsAlaDC and AtSerDC demonstrate decarboxylase activity, while the mutated proteins have experienced a loss of decarboxylation activity. We have already modified this concern in the revised version of the manuscript.

      (14) Line 222: residues not residue

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (15) Line 227 and Figure 4B: It is not clear what the different sequence logos mean in this part of the figure. The caption is too brief and not helpful. And the sentences describing this figure panel are also not sufficiently clear.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have provided a more detailed explanation of this section in the revised manuscript and added additional annotations in the figure caption to provide further clarity.

      (16) Lines 233 and 234: "in the substrate specificity" is awkwardly worded. I would revise to "in selective binding of the appropriate substrate."

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have meticulously revised the description of this section.

      (17) Line 243: a word is missing in this sentence - but I can't figure out the intended meaning or what the missing word is. Rephrase to improve clarity.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have revised this sentence to: " These findings indicate the essential role of Phe106 in the selective binding of alanine for CsAlaDC. "

      (18) Line 255: The "expression system...was carried out" is not correct. I would say the expression system was used - but you probably also want to rearrange the sentences to more directly say what it was used for. Later, the word "the" is also missing.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have revised this sentence to: "To further verify that Phe106 of CsAlaDC and Tyr111 of AtSerDC were key amino acid residues determining its substrate recognition in planta, we employed the Nicotiana benthamiana transient expression system. "

      (19) Line 273: use "understand" instead of "elucidate" and instead of "we proposed a prediction test:" say "we designed a test of the prediction that..."

      Thank you very much for your careful reading of the manuscript. We have revised this sentence to: “In light of this observation, we postulated a hypothesis:”

      (20) Line 301: I don't think "effectuate" is a word. Replace with something else.

      Thank you very much for your careful reading of the manuscript. We have revised the sentence as: " The biosynthetic pathway of theanine in tea plants comprises two consecutive enzymatic steps: alanine decarboxylase facilitates the decarboxylation of alanine to generate EA, while theanine synthetase catalyzes the condensation reaction between EA and Glu to synthesize theanine. "

      (21) Line 307: replace "activity" with "ability"

      Thank you very much for your careful reading of the manuscript. We have corrected it in the revised manuscript.

      (22) Line 322: I didn't find the discussion very useful. Much of it is simply a recap of the results - which is not necessary. The structural comparisons are overly descriptive without providing appropriate rationale or topic sentence structure so that the reader understands why certain details are emphasized. I think the manuscript would be much stronger if this section were not included or integreted more concisely into the results section where appropriate.

      Thank you for your constructive comments. We understand your concerns about the discussion section of our manuscript. We acknowledge that the discussion section has redundancies with the result. In response to this, we have revised this section to eliminate unnecessary repetition of the results.

      (23) Line 369: "an amino acid devoid of the hydroxyl moiety present in Lys" - what does this mean? Lys does not have a hydroxyl functional group. Please correct so that the sentence makes sense.

      Thank you very much for your careful reading of the manuscript. This sentence states that the amino acid occupying the corresponding position in CsAlaDC is Phe, which lacks one hydroxyl functional group as compared to Lys. We have made modifications to the sentence as follows: "In contrast, the equivalent position in CsAlaDC is occupied by Phe, an amino acid lacking the hydroxyl group. This substitution enhances the hydrophobic nature of the substrate-binding pocket. "

      (24) Line 370: "This structural nuance portends a predisposition for CsAlaDC to select the comparatively hydrophobic amino acid alanine as its suitable substrate." This sentence also makes no sense - please revise to use simpler language so the meaning is more clear.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have revised the sentence as follows: " Consequently, CsAlaDC demonstrates a unique predilection, selectively binding Ala (an amino acid with comparatively hydrophobic properties) as its preferred substrate."

      (25) Lines 376-384: This section makes several references to "catalytic rings." I have no idea what this term means? If the authors mean a loop structure in the enzyme - please use the term "loop"

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have corrected it in the revised manuscript.

      (26) Line 396-397: The authors reference data that is not shown in the manuscript. Either show the data in the results section or do not mention.

      Thank you for your insightful comment regarding the unshown data referenced in the manuscript. We have included Supplementary figure 9 in the revised manuscript to display this data.

      (27) Line 445-446: what is "mutation technology" - if the authors mean site-directed mutagenesis - please use the simpler and more recognizable terminology.

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have revised the sentence as follows: "Based on the findings of this study, site-directed mutagenesis can be employed to modify enzymes involved in theanine synthesis. This modification enhances the capacity of bacteria, yeast, model plants, and other organisms to synthesize theanine, thereby facilitating its application in industrial theanine production."

      Reviewer #3 (Public Review):

      In the manuscript titled "Structure and Evolution of Alanine/Serine Decarboxylases and the Engineering of Theanine Production," Wang et al. solved and compared the crystal structures of Alanine Decarboxylase (AlaDC) from Camellia sinensis and Serine Decarboxylase (SerDC) from Arabidopsis thaliana. Based on this structural information, the authors conducted both in vitro and in vivo functional studies to compare enzyme activities using site-directed mutagenesis and subsequent evolutionary analyses. This research has the potential to enhance our understanding of amino acid decarboxylase evolution and the biosynthetic pathway of the plant specialized metabolite theanine, as well as to further its potential applications in the tea industry.

      Thank you very much for taking the time to review this manuscript. We appreciate all your insightful comments.

      Reviewer #3 (Recommendations For The Authors):

      The additional material added by the authors addresses some of the previously raised questions and enhances the manuscript's quality. However, certain critical issues we pointed out earlier remain unaddressed. Some of the new data also raises new questions. To provide readers with more comprehensive data, the authors should include additional quantitative data and convert the data presented in the reviewer's comments into supplemental figure format.

      Thank you for acknowledging the improvements in the revised manuscript and providing further valuable feedback. We understand your concern about the critical issues that have not been fully addressed and the new questions raised by some of the newly added data. We have strived to address these issues with additional analysis and clarification in our subsequent revision. Regarding your suggestion for more quantitative data and converting the data mentioned in the reviewer's comments into a supplemental figure format, we agree that this would provide a more comprehensive view of the results. We have reformatted the relevant data into supplemental figures to enhance the clarity and accessibility of information. We are grateful for the time and effort you have dedicated to improving our manuscript.

      * Page 5 & Figure 1B

      "As expected, CsSerDC was most closed to AtSerDC, which implies that they shared similar functions. However, CsAlaDC is relatively distant from CsSerDC."

      : In Figure 1B, CsSerDC and AtSerDC are in different clades, and this figure does not show that the two enzymes are closest. To provide another quantitative comparison, please provide a matrix table showing amino acid sequence similarities as a supplemental table. 

      Comment: I don't believe that a 1.29% difference between 86.21% and 84.92% in amino acid similarity is statistically significant. Although the authors have rephrased the original sentence, it's improbable that this small 1.29% difference can explain the observed distinction.

      Many thanks. We have carefully considered your comments. Indeed, the 1.29% difference in amino acid similarity cannot reflect the functional difference between the AlaDC and SerDC proteins. We have deleted the relevant descriptions in the revised manuscript.

      * Page 6, Figure 2, Page 23 (Methods)

      "The supernatants were purified with a Ni-Agarose resin column followed by size-exclusion chromatography."

      : What kind of SEC column did the authors use? Can the authors provide the SEC elution profile comparison results and size standard curve?

      Comment: The authors should include the SEC elution profiles as a supplemental figure or incorporate them as a panel in Figure 2. Furthermore, they should provide a description of the oligomeric state of each protein in this experiment. Additionally, there is a significant difference between CsSerDC (65.38 mL) and CsAlaDC (74.37 mL) elution volumes. Can this difference be explained structurally? In comparison to the standard curve of molecular weight provided by the authors, it appears that these proteins are at least homo-tetramers, which contradicts the description in the text. This should be re-evaluated and clarified.  

      Thank you very much for your careful reading of the manuscript and valuable suggestions. We have included the SEC elution profile in Supplemental figure 1A and added descriptions of the oligomeric states of proteins in the revised manuscript. CsSerDC was eluted at 65.38 mL, corresponding to a molecular weight of 292 kDa, which is five times the monomeric protein (54.7 kDa). However, due to the absence of CsSerDC crystal structure, it remains uncertain whether the protein forms a pentamer. AtSerDC was eluted at 72.25 mL, with a corresponding molecular weight of 155 kDa, which is 3.3 times the monomer (47.3 kDa). CsAlaDC was eluted at 74.37 mL, with a corresponding molecular weight of 127 kDa, which is 2.7 times the monomer (47.3 kDa). The elution profiles suggest that AtSerDC and CsAlaDC potentially exist in homotrimeric form. This observation stands in contradiction to our subsequent findings where the protein manifests in a dimeric structure. A plausible explanation could be the non-ideal spherical shape of the protein. Under such circumstances, the hydrodynamic radius of the protein could supersede its actual size, potentially leading to an overestimation of the molecular weight on the size-exclusion chromatography [ref].

      References:

      Burgess, R. R. (2018) A brief practical review of size exclusion chromatography: Rules of thumb, limitations, and troubleshooting. Protein Expression and Purification. 150, 81-85.

      Erdner J. M., et al. (2006) Size-Exclusion Chromatography Using Deuterated Mobile Phases. Journal of Chromatography A. 1129(1):41–46.

      * Page 6 & Page 24 (Methods)

      "The 100 μL reaction mixture, containing 20 mM substrate (Ala or Ser), 100 mM potassium phosphate, 0.1 mM PLP, and 0.025 mM purified enzyme, was prepared and incubated at standard conditions (45 {degree sign}C and pH 8.0 for CsAlaDC, 40 {degree sign}C and pH 8.0 for AtSerDC for 30 min)."

      (1) The enzymatic activities of CsAldDC and AtSerDC were measured at two different temperatures (45 and 40 {degree sign}C), but their activities were directly compared. Is there a reason for experimenting at different temperatures?

      (2) Enzyme activities were measured at temperatures above 40{degree sign}C, which is not a physiologically relevant temperature and may affect the stability or activity of the proteins. At the very least, the authors should provide temperature-dependent protein stability data (e.g., CD spectra analysis) or, if possible, temperature-dependent enzyme activities, to show that their experimental conditions are suitable for studying the activities of these enzymes.

      Comment: I appreciate the authors for including temperature-dependent enzyme activity data in their study. However, it remains puzzling that plant enzymes were tested at a physiologically irrelevant temperature of 40 and 45 degrees Celsius. Additionally, it may not be appropriate to directly compare enzyme activity measurements at different temperatures. Furthermore, the data at 45 degrees in panel A appears to be an outlier, which contrasts with the overall trend observed in the graph.

      We appreciate your point regarding the testing temperatures for plant enzymes. We fully appreciate the importance of conducting experiments under physiologically relevant conditions. But the intent behind operating at these elevated temperatures was to assess the thermal stability of the enzymes, which can be a valuable characteristic in certain applications, such as industrial production processes, and does not necessarily reflect their physiological conditions. Our findings indicate that CsAlaDC exhibits its peak activity at 45 °C. This result aligns with previously reported data in the literature [Bai, P. et al. (2021) figure 4e], thus bolstering our confidence in the reliability of our experimental outcomes.

      Author response image 1.

      Relative activity of CsAlaDC at different temperatures.

      * Pages 6-7 & Table 1

      (1) Use the correct notation for Km and Vmax. Also, the authors show kinetic parameters and use multiple units (e.g., mmol/L or mM for Km).

      (2) When comparing the catalytic efficiency of enzymes, kcat/Km (or Vmax/Km) is generally used. The authors present a comparison of catalytic activity from results to conclusion. A clarification of what results are being compared is needed.

      Comment: The authors are still comparing catalytic efficiency solely based on the Vmax values. As previously suggested, it would be advisable to calculate kcat/Km and employ it for comparing catalytic efficiencies. Furthermore, based on the data provided by the authors, I conducted a rough calculation of these catalytic efficiencies and did not observe a significant difference, which contrasts with the authors' statement, "These findings indicated that the catalytic efficiency of CsAlaDC is considerably lower than that of both CsSerDC and AtSerDC." This discrepancy requires clarification.  

      We want to express our sincere appreciation for your meticulous review and constructive suggestions. We understand the importance of accurately comparing catalytic efficiencies using Kcat/Km values, rather than solely relying on Vmax values. Following your suggestion, we recalculated Kcat/Km to reanalyze our results. The computed Kcat/Km for CsSerDC and AtSerDC are 152.7 s-1 M-1 and 184.6 s-1 M-1, respectively. For CsAlaDC, the calculated Kcat/Km is 55.7 s-1 M-1. Therefore, the catalytic efficiency of CsSerDC and AtSerDC is approximately three times that of CsAlaDC.  What we intended to convey was that the Vmax of CsAlaDC is lower than that of CsSerDC and AtSerDC.  Our description in the manuscript was not accurate, and we have addressed this in the revised version.

      * Pages 9 & 10

      "This result suggested this Tyr is required for the catalytic activity of CsAlaDC and AtSerDC."

      : The author's results are interesting, but it is recommended to perform the experiments in a specific order. First, experiments should determine whether mutagenesis affects the protein's stability (e.g., CD, as discussed earlier), and second, whether mutagenesis affects ligand binding (e.g., ITC, SPR, etc.), before describing how site-directed mutagenesis alters enzyme activity. In particular, the authors' hypothesis would be much more convincing if they could show that the ligand binding affinity is similar between WT and mutants.

      Comments: While it is appreciated that you have included CD and UV-vis absorption spectra data, it would be more beneficial to provide quantitative data to address the previously proposed binding affinity. I also recommend presenting the data mentioned in the reviewer's comments as a supplementary figure for better clarity and reference.  

      Thank you for your valuable feedback and suggestions. I agree that providing quantitative data would lend more support to our findings and better address the proposed binding affinity.

      It is generally acknowledged that proteins complexed with PLP exhibit a yellow hue, and the ligand PLP forms a Schiff base structure with the ε-amino group of a lysine residue in the protein, with maximum absorbance around 420 nm. However, during our protein purification process, we observed that the purified protein retained its yellow coloration, even when PLP wasn't introduced into the purification buffer. Subsequent absorbance measurements revealed that the protein exhibited absorbance within the aforementioned wavelength (420 nm) (the experimental results are shown in the following figures), implying an inherent presence of the PLP ligand within the protein. This could have resulted from binding with PLP during the protein's expression in E. coli. Consequently, due to this inseparability between the protein and the ligand, obtaining quantitative data through experimental means becomes unfeasible.

      Author response image 2.

      (A) Absorption Spectra of CsAlaDC (WT) and CsAlaDC (Y336F). (B) Absorption Spectra of AtSerDC (WT) and AtSerDC (Y341F).

      Regarding your suggestion about presenting the data mentioned in the reviewer's comments as a supplementary figure, we agree that it is an excellent idea. We have prepared supplementary figure 7 and supplementary figure 8 accordingly, ensuring that they present the required data.

      * Page 10

      "The results showed that 5 mM L-DTT reduced the relative activity of CsAlaDC and AtSerDC to 22.0% and 35.2%, respectively"

      : The authors primarily use relative activity to compare WT and mutants. Can the authors specify the exact experiments, units, and experimental conditions? Is it Vmax or catalytic efficiency? If so, under what specific experimental conditions?

      Response: "However, due to the unknown mechanism of DTT inhibition on protein activity, we have removed this part of the content in the revised manuscript."

      Comment: I believe this requires a more comprehensive explanation rather than simply removing it from the text.  

      Although we have observed that DTT is capable of inhibiting enzyme activity, at present, we are unable to offer a comprehensive explanation for the inhibitory effect of DTT on enzyme activity in terms of its structural and catalytic mechanisms. Further research is required to elucidate the mechanism of action of DTT. It is worth noting, however, that our study does not emphasize investigating the specific inhibitory mechanisms of DTT on enzyme activity. Furthermore, the existing findings do not provide an adequate explanation for the observed phenomenon, leading us to exclude this particular aspect from the content.

      * Pages 10-12

      : The identification of 'Phe106 in CsAlaDC' and 'Tyr111 in AtSerDC,' along with the subsequent mutagenesis and enzymatic activity assays, is intriguing. However, the current manuscript lacks an explanation and discussion of the underlying reasons for these results. As previously mentioned, it would be helpful to gain insights and analysis from WT-ligand and mutant-ligand binding studies (e.g., ITC, SPR, etc.). Furthermore, the authors' analysis would be more convincing with accompanying structural analysis, such as steric hindrance analysis.

      Comment: While it is appreciated that you have included UV-vis absorption spectra data, it would be more beneficial to provide quantitative data to address the previously proposed binding affinity. I also recommend presenting the data mentioned in the reviewer's comments as a supplementary figure for better clarity and reference.  

      Response: Thank you for your valuable feedback and suggestions. Given that the protein forms a complex with PLP during its expression in E. coli and cannot be dissociated from it, obtaining quantitative data via experimental protocols is rendered impracticable.

      Author response image 3.

      (A) Absorption Spectra of CsAlaDC (WT) and CsAlaDC (F106Y). (B) Absorption Spectra of AtSerDC (WT) and AtSerDC (Y111F).

      Mutant proteins and wild-type proteins exhibited absorption bands at 420 nm, suggesting the formation of a Schiff base between PLP and the active-site lysine residue.

      Regarding your suggestion about presenting the data mentioned in the reviewer's comments as a supplementary figure, we have prepared supplementary figure 7 and supplementary figure 8 accordingly, ensuring that they present the required data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:  

      Overall, the conclusions appear appropriately supported by the data, and the data appear of high quality.

      Strengths:

      The particular strengths of the paper include an impressive combination of genomic and imaging-based approaches and insightful genetically engineered cell systems. The manuscript reports interesting and potentially important findings. The text is generally very well written, the ideas are clearly explained, and the reasoning is easy to follow.

      Weaknesses:

      The main weakness seems to be that the heat and ethanol shock approaches likely elicit pleiotropic effects, and therefore it is a challenge to test the causal relationship between various observations. Nevertheless, even as indirect effects might contribute to some of the authors' observations, the results are definitively worth reporting.  

      We agree that these two proteotoxic stresses can impact cell physiology in multiple ways and discuss this on lines 132-143 and 500-519. Moreover, in this revision we have more rigorously quantified the extent of proteotoxic stress elicited by the 39°C heat shock and 8.5% ethanol stress (Figure 1E; see response 1 to Reviewer 2). We have additionally added new Figure 2 that reveals an important difference in the way Hsf1 and its negative regulator, the Hsp70 co-chaperone Sis1, respond to HS and ES. This difference is evident at two different intensities for each stress as described in more detail below (see response 1 to Reviewer 2).

      Presentation of some of the data could be improved.

      We agree and have made improvements/data additions to multiple figures: Figure 1E; Figures 3A, B; Figures 4A, B; Figure 7 (data drawn from original Fig. 6 and Fig. 6 – fig. suppl. 1 and reorganized); Fig. 8B; Figure 9; Figure 10. Corresponding enhancements to the supplemental figures have been made as well. 

      Reviewer #2:  

      (1) The central finding of the study highlights the different dynamics of Hsf1, Pol II, and gene organization in response to heat shock versus ethanol stress. However, one important limitation to consider is that the two chosen conditions may not be directly comparable. For a balanced assessment, the authors should ideally expose yeast to various ethanol concentrations and different heat shock temperatures, ensuring the observed differences stem from the nature of the stressor rather than suboptimal stress intensity. At the very least, an additional single ethanol concentration point on each side of 8.5% should be investigated to ensure that 8.5% is near the optimum. In fact, comparing the number of Hsp104 foci in the two conditions in Fig. 1E and F suggests that the yeast is likely experiencing different intensities of stress for the chosen heat shock condition and ethanol concentration used in this study.

      We thank the reviewer for this important suggestion. In this revision, we have included an enhanced analysis of the yeast cellular response to each of these stresses. As illustrated in revised Figure 1, the two stresses used throughout this study – 39°C heat shock and 8.5% ethanol stress – both elicit a proteotoxic response, as assayed by the de novo formation of Hsp104 clusters. While 10 min exposure to 8.5% ethanol results in the formation of multiple discrete (spherical) foci, a 10 min exposure to the elevated temperature leads the appearance of multiple, largely diffuse Hsp104 clusters, some of which are spherical (new Fig. 1D). The difference in morphology notwithstanding, we have attempted to quantify these clusters using Imaris v. 10.0.1 image analysis software; the results are depicted in Fig. 1E. Such quantification suggests that 8.5% ethanol elicits a more intense stress than exposure to 39°C. A caveat is that it is unclear whether diffuse Hsp104 clusters are comparable to compact Hsp104 foci (see response 3 below).

      Beyond the apparent difference in intensity, a new analysis presented in new Figure 2 reveals that heat shock, elicited by temperature upshift to either 39°C or 42°C, induces relocalization of the J-protein Sis1 – a key negative regulator of Hsf1 – from the nucleoplasm to the nucleolar periphery. Sis1’s perinucleolar ring localization agrees with previous findings of 39°C heat-shocked cells (Feder et al., 2021). Ethanol stress, whether 5% or 8.5%, initially causes Sis1 to relocalize diffusely throughout the nucleus and cytosol. At 10 min, Sis1 localizes to the periphery of the nucleus, thereby providing a marked contrast to what is observed in response to heat shock. These new results are described on lines 174-191.

      Taking these two observations together, we asked whether a less severe ethanol stress (5%) would induce Hsf1 puncta. It does, and as rapidly as 8.5% ethanol (data are presented in revised Figure 8-figure supplement 1). Interestingly, in the presence of 5% ethanol, Hsf1 puncta begin to dissolve at 30 min. This strongly contrasts with the case when cells are exposed to 8.5% ethanol (Figure 8; Figure 8-figure supplement 1). As we state in this revision (lines 414-424), the sustained presence of condensates that we originally observed is likely the consequence of the intensity of the proteotoxic stress elicited by exposure to 8.5% ethanol; analogous responses to these two stress conditions have been observed before (lines 495-501). 

      (2) A second significant concern is the use of the term "Hsf1 condensate". Chowdhary et al.'s 2022 Molecular Cell study highlighted an inhomogeneous distribution and rapid dynamics of Hsf1 clustering upon heat shock, with sensitivity to 1,6-hexandiol, which is interpreted as evidence for condensation by LLPS. However this interpretation has been criticized severely by McSwiggen et al. Genes Dev 2019 and Mussacchio EMBO J 2022. It is important to mention that 1,6-hexandiol is known to affect chromatin organization (Itoh et al. Life Science Alliance 2021). Describing such clusters as 'condensates' without further experimental evidence is premature.  

      While we appreciate and largely agree with the point made by this reviewer, we prefer to maintain the term “condensate”. Banani et al (2017) originally defined “biomolecular condensate” to mean selforganized membrane-free compartments that concentrate specific biomolecules. It was never meant to imply LLPS although its widespread use in the literature has led to that implication. We clarify our use of this term on lines 99-104.   

      (3) Figure 1: Why does ethanol stress at 0 min display a larger number of Hsp104 foci per cell than heat shock at the same time? How are foci defined by the authors? In Fig. 1D, there are many smaller puncta. A comparative assessment of the number and size of foci for heat shock and ethanol stress would be beneficial.

      We thank the reviewer for raising this point and have addressed it as follows.  First, we repeated the assay with a different strain (DPY1561) and increased the number of cells assayed from 40 to 200. This larger sample size created the same T=0 baseline for both stresses (Figure 1E). Second, we define Hsp104 foci as diffraction-limited structures with a diameter of ~0.4 µm (lines 747-749).  Third, employing Imaris v. 10.0.1, we quantified foci size (= volume) and a summary graph has been added to Figure 1E that also displays the number of foci per cell. In the legend to this figure, we point out that to conduct this analysis we assumed that the diffuse Hsp104 clusters seen in HS cells are comparable to the compact Hsp104 foci in ES cells (lines 1169-1171). 

      (4) Figure 2: Selecting a housekeeping gene with consistent expression levels is crucial for meaningful qPCR analysis. Do SCR1 mRNA levels fluctuate during heat shock or ethanol stress?  

      We thank the reviewer for this question. In revised Figure 3 – figure supplement 1C we provide a new graph (reproduced here) revealing that the levels of SCR1 do not significantly change under either heat shock or ethanol stress relative to the non-stressed control (0 min). One-way ANOVA analysis was performed for both HS and ES and p values were 0.094 and 0.083, respectively (calculated using GraphPad Prism 8).

      (5) Additionally, certain genes, such as TMA10 and SSA4, lack visible bars at time 0. Are these levels undetectable? The varying y-axis scales are confusing; presenting data as relative fold changes could offer a clearer perspective.

      Transcript levels for all genes evaluated here are detectable, even in the basal unstressed state. They are not visible on the histogram for certain genes at T= 0 due to the prodigious fold-increase in RNA elicited by heat shock.  However, to address this concern, we have added a bar graph inset displaying basal transcript levels for each gene in revised Figure 3. We reproduce data for SSA4 and TMA10 in the graphs below. In addition, we present transcript levels in new Figure 3 - figure supplement 1 for cells subjected to ethanol stress to allow a better appreciation of their increase over time. 

      Author response image 1.

      (6) Line 239: The evidence for chromatin compaction is unconvincing. An increase in H3 occupancy by ChIP might indicate a reduction in histone exchange dynamics but may not relate to overall chromatin compaction. The authors use H2A-mCherry to suggest a decrease in chromatin volume, but this data is not persuasive. Did the authors observe any changes in nuclear size? Perhaps quantifying chromatin compaction more directly, using signal intensity per volume, would be informative.

      To address this concern, we attempted to quantify integrated density for H2A-mCherry using Image J software. While the volume decreased for both stresses, the integrated density only increased for ethanol stress. We speculate that this may be due to photobleaching which has been reported for heat shock. The combination of heat and acidic pH contribute to loss of fluorescence signal (Alkaabi et al., 2005). While the integrated density supports the idea of global chromatin compaction in the ethanol stress condition, given the above concerns with the HS sample we elected to not present these data.

      (7) Line 340: The claim of a "strong spatiotemporal correlation" isn't evident from the data. Could correlation coefficients be provided? There is potential anti-correlation in Fig. 6 - Figure Supplement 1C.

      We thank the reviewer for this excellent suggestion. We now present an analysis of the correlation between HSP104 – HSP12 coalescence and HSP104 transcription for both HS and ES time courses, using single cell data of Figures 7D, 7E and Figure 7- suppl. 1D.  This analysis is presented in new Figure 7F.

      (8) Figure 8: The WT data in Fig 8 seem inconsistent with Fig. 4 (e.g. the interaction frequency for HSP104 and SSA2). Are these fluctuations between experiments, or are they side effects of IAA treatment? The use of ethanol as an IAA solvent vehicle raises concerns. It would be beneficial if the authors could demonstrate that 1.7% ethanol in the control does not induce ethanol stress.

      We acknowledge that there existed an inconsistency in the magnitude of intergenic interaction frequencies reported in the two experiments for HSP104 and SSA2. Some of this might be attributed to the fact that different strains were used, W303-1B in Figure 4 and LRY016 (W303-1B; LEU2::pGPD1osTIR1) in Figure 8. Nonetheless, in each experiment there was a prodigious fold-increase in interaction frequency over the no stress (T= 0 min) control for both HS and ES conditions and moreover, in each experiment the magnitude of this interaction was greater for the 2.5 min HS sample vs. the 10 min ES sample. However, to obviate this concern, we have removed the HSP104-SSA2 analysis from Figure 9 (corresponds to original Fig. 8).

      Regarding the second point, we cannot entirely rule out the concern that the 1.7% ethanol vehicle might impact 3C interaction frequencies. It is unlikely to be significant, however, given that most other pairwise tests evaluated in the two experiments (Figs. 5 and 9) resulted in similar 3C values. In particular, there was no consistent trend towards higher (or lower) interaction frequencies in the IAA experiment of Fig. 9.  

      Reviewer #3:  

      This is an interesting manuscript that builds off of this group's previous work focused on the interface between Hsf1, heat shock protein (HSP) mRNA production, and 3D genome topology. Here the group subjects the yeast Saccharomyces cerevisiae to either heat stress (HS) or ethanol stress (ES) and examines Hsf1 and Pol II chromatin binding, Histone occupancy, Hsf1 condensates, HSP gene coalescence (by 3C and live cell imaging), and HSP mRNA expression (by RT-qPCR and live cell imaging). The manuscript is well written, and the experiments seem well done, and generally rigorous, with orthogonal approaches performed to support conclusions…While identifying a mechanistic basis for the results [presented here] would be a tough task perhaps beyond the scope of this study, it would nevertheless be helpful to place these results in context with a series of other studies…importantly, this work left out PMID: 32015439 (HSF1 phase transition mediates stress adaptation and cell fate decisions) which is particularly relevant considering that it shows that it is human HSF1 condensate resolution rather than simple condensate formation that is associated with HSF1 transcriptional activity - which is similar to the findings here with this particular dose of HS resulting in resolution and high transcriptional activity versus ES resulting in resolution failure and lower activity. 

      We thank the Reviewer for pointing out this oversight. In this revision, we cite Gaglia et al., 2020 and several others reporting HSF1 foci formation in human cells exposed to heat shock. The single cell analysis of Gaglia et al argued that dissolution of large HSF1 foci (aka “nuclear stress bodies”), typically several µm in diameter and localized over satellite III DNA repeats (Jolly et al., 1997, 2002), correlates with HSP gene activation. Importantly, these condensates are postulated to act as reservoirs of HSF1, sequestered away from HSP genes (Gaglia et al., 2020).  In contrast, Zhang et al., 2022 has shown that human HSF1 inducibly forms small condensates (~300 nm) that localize over HSP genes and whose formation directly correlates with HSP gene activation (we discuss the Jolly, Gaglia and Zhang findings on lines 382-394). Likewise, our work shows that in yeast, Hsf1 inducibly forms small, dynamic clusters that colocalize with HSR genes within 2.5 min of exposure to elevated temperature; these dissolve ~20-60 min later (Figure 8 and Figure 8-supp. 1). In concert with Hsf1 condensate formation, HSR gene repositioning and transcription/ Pol II recruitment are likewise evident within 2.5 min. Therefore, in HS cells there exists coordinate induction of condensate formation, Pol II recruitment, transcription and intergenic interactions (for a detailed kinetic analysis of HSR gene interactions, see Figures 5 and 6 of Chowdhary et al, 2017).  This tight temporal relationship is absent in ethanol stressed cells (Figures 3, 4, 5, 6, 7, 8; summarized in Figure 10 and Table 1).

      It is also worth noting that the stresses themselves are quite different - ethanol can be used as a carbon source and so beyond inducing proteotoxic stress, the yeast are presumably adapting to this distinct metabolic state. Basically, it is not clear whether these differences are due to the dose of stress, versus we are looking at an early timepoint as ES initiates a genome-wide chromatin restructuring and gene expression reprogramming that goes beyond a response to proteotoxic stress. This reviewer is not suggesting a barrage of new experiments, but perhaps discussion points to contextualize results.

      We thank the reviewer for this suggestion and in our revised manuscript discuss these issues (lines 414424 and 486-498 [5% vs. 8.5% ethanol]; lines 500-519 [ethanol as a metabolite]).

      Recommendations for the authors:

      Reviewer #1:

      (1) In Figure 1E, the number of foci in control (0 min) cells is very different for the two conditions. Could the authors clarify/check this? Based on the mean numbers at time point 0, the control cells for the ethanol treatment already contain about 10-20 Hsp104 foci, compared to around 5 foci per cell in the control for heat shock.

      We thank the reviewer for raising this point and have repeated the assay with a different strain (DPY1561).  And as shown in Figure 1E, have confirmed that the control samples have similar number of foci.  

      (2) In the same Figure 1E, is the P-value relative to the control or the same time point in the other treatment? A comparison across treatments would be necessary to support the claim in lines 168-171 of the text.

      The statistical analysis (Mann Whitney test) was performed by comparing each stress timepoint to the no stress control. We clarify this in the figure legend. 

      (3) In Figure 1D, the heat-shock condition shows the same cells that are used in the control, but the cells in the ethanol-shock condition are different. This is a bit visually misleading compared to the experimental setup shown in panel 1C. The authors could show the control cells for the ethanol condition as well.

      We thank the reviewer for this excellent suggestion and have added the 0 min image for the ethanol stress conditions.

      (4) In Figure 7B adding images at 60min would help underscore the point that the condensates are stable in ethanol shocked cells.

      We appreciate this suggestion as well and have included a 60 min timepoint for both stresses (Figure 8B). 

      Reviewer #2:

      (1) Line 113: Has it not been established that yeast Hsf1 is constitutively trimeric?

      In yeast, only a fraction of Hsf1 is thought to be constitutively trimeric and it is this species that binds high-affinity HSEs even under non-stressful conditions (Giardina & Lis, 1995; Pincus et al., 2018). We have added this clarification to the text (lines 121-123). 

      (2) Ethanol can precipitate proteins, especially in rich media like YPD. Did the authors notice any protein precipitation? If yes, how do they account for effects due to nutrient loss by precipitation?

      This is an interesting point, but we did not notice any precipitates in either rich or synthetic liquid media containing 8.5% (v/v) ethanol for any of the time points used in the experiments.

      (3) Figure 3: The figure appears incomplete. Can enhancer, promoter, coding region, and 3'UTR be shown consistently for all genes examined?

      In response to this point, we have simplified this figure (new Fig. 4) by uniform presentation of factor occupancy at enhancer, promoter, and coding region loci for all but one of the genes evaluated. For HSP12 (330 bp), we were unable to distinguish promoter from coding region since the average sonicated chromatin fragment obtained using a Bioruptor is ~300 bp. Therefore, we evaluated only the HSP12 coding region for Pol II and histone H3 occupancy. 

      (4) Figure 4: The comparison between heat shock at 2.5 min and ethanol stress at later points is puzzling. Why not use consistent time points as in Fig. 3?

      Time points for the two stresses examined in this figure (new Fig. 5) were selected to represent times of peak intergenic interaction between HSR genes. These times were derived from our earlier analysis of 3C interactions during a heat shock time course (Figs. 5, 6 of Chowdhary et al., 2017) and ES data presented in this study, including Fig. 4 (Pol II ChIP time course) and Fig. 6 (3C time course). Data presented in Figs. 5 and 6 are consistent with the notion that intergenic interactions in cells subjected to ethanol stress are delayed relative to those observed in heat shocked cells, peaking in most cases at ~10 min (vs. ~2.5 min for heat stress (Chowdhary et al., 2017)).  

      (5) Figure 5: Fig. 5B top panel seems to show color inconsistencies for bars at 0 and 120 min. Also, the xaxis on the top left panel seems to have a typo; should it read "10," not "0?"

      We thank the reviewer for the observation. We changed the graphs in new Figure 6 to display the same color for all time points.  We also fixed the typo. 

      (6) Line 302: The evidence presented supports maximal mRNA levels, but the claim of "maximal transcription" requires support from nascent RNA analysis.

      We agree that RT-qPCR measures mRNA abundance, not nascent transcription. We have changed the text to refer to “transcript levels” where pertinent (lines 301-302; 1331-1332).

      (7) How long do loci remain coalescent during heat shock versus ethanol stress? Both 3C and imaging analyses do not differentiate between frequency and duration, which seems essential for understanding interaction dynamics.

      We thank the reviewer for this excellent question. In new Fig. 7D,E (data drawn from Fig. 6 – fig. suppl. 1), HSR gene coalescence detected in single cells over a HS or ES time course is charted.  Interpretable data exist for a small number of cells. Moreover, for both HS and ES states, in certain cells coalescence between the representative Hsf1 target genes HSP104 and HSP12 dissolves and then reappears. With this caveat in mind, the data suggest that HSP104-HSP12 coalescence can last at least 15 min in HS cells and up to 30 min in ES cells. We have not emphasized this point in the manuscript since a far more comprehensive analysis – beyond the scope of this study – is required.

      (8) For longer analyses, how do the authors accommodate potential ethanol concentration changes due to evaporation?

      For liquid cultures, we relied on maintaining minimal changes in the vapor pressure within the experimental vessel; to facilitate that, flasks were tightly covered to minimize evaporation and temperature was kept at 25°C. For most molecular analyses (RT-qPCR, ChIP, 3C), we confined our analysis to the first 60 min. For microscopy, the samples were encased within a concave slide, covered by a coverslip, as illustrated below. In addition, to tightly seal the coverslip on the slide we used petrolatum.  This arrangement minimized evaporation.

      Author response image 2.

      (9) Figure 9: This legend seems to have an incomplete sentence: "(represented using ...)."

      We have substituted an entirely new model in this revised manuscript (new Figure 10) that omits the use of an ellipsis. (We had used it to symbolize a delay in the appearance of HSR gene transcription in ES cells.)

      References  

      Alkaabi, K. M., Yafea, A., & Ashraf, S. S. (2005). Effect of pH on thermal- and chemical-induced denaturation of GFP. Applied Biochemistry and Biotechnology, 126(2), 149–156. https://doi.org/10.1385/ABAB:126:2:149

      Chowdhary, S., Kainth, A. S., & Gross, D. S. (2017). Heat Shock Protein Genes Undergo Dynamic Alteration in Their Three-Dimensional Structure and Genome Organization in Response to Thermal Stress. Molecular and Cellular Biology, 37(24), 1–23. https://doi.org/10.1128/mcb.00292-17

      Feder, Z. A., Ali, A., Singh, A., Krakowiak, J., Zheng, X., Bindokas, V. P., Wolfgeher, D., Kron, S. J., & Pincus, D. (2021). Subcellular localization of the J-protein Sis1 regulates the heat shock response. Journal of Cell Biology, 220(1), e202005165. https://doi.org/10.1083/JCB.202005165

      Gaglia, G., Rashid, R., Yapp, C., Joshi, G. N., Li, C. G., Lindquist, S. L., Sarosiek, K. A., Whitesell, L., Sorger, P. K., & Santagata, S. (2020). HSF1 phase transition mediates stress adaptation and cell fate decisions. Nature Cell Biology, 22(2), 151–158. https://doi.org/10.1038/s41556-019-0458-3

      Giardina, C., & Lis, J. T. (1995). Dynamic protein-DNA architecture of a yeast heat shock promoter. Molecular and Cellular Biology, 15(5), 2737–2744. https://doi.org/10.1128/mcb.15.5.2737

      Jolly, C., Konecny, L., Grady, D. L., Kutskova, Y. A., Cotto, J. J., Morimoto, R. I., & Vourc’h, C. (2002). In vivo binding of active heat shock transcription factor 1 to human chromosome 9 heterochromatin during stress. Journal of Cell Biology, 156(5), 775–781. https://doi.org/10.1083/jcb.200109018

      Jolly, C., Morimoto, R. I., Robert-Nicoud, M., & Vourc’h, C. (1997). HSF1 transcription factor concentrates in nuclear foci during heat shock: Relationship with transcription sites. Journal of Cell Science, 110(23), 2935–2941. https://doi.org/10.1242/jcs.110.23.2935

      Pincus, D., Anandhakumar, J., Thiru, P., Guertin, M. J., Erkine, A. M., & Gross, D. S. (2018). Genetic and epigenetic determinants establish a continuum of Hsf1 occupancy and activity across the yeast genome. Molecular Biology of the Cell, 29(26), 3168–3182. https://doi.org/10.1091/mbc.E18-060353

      Zhang, H., Shao, S., Zeng, Y., Wang, X., Qin, Y., Ren, Q., Xiang, S., Wang, Y., Xiao, J., & Sun, Y. (2022). Reversible phase separation of HSF1 is required for an acute transcriptional response during heat shock. Nature Cell Biology, 24(3), 340–352. https://doi.org/10.1038/s41556-022-00846-7

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      (1) In cardiac and renal transplantation, cold preservation in ice remains a common practice for transporting explanted hearts to donors which remains a cheap and easily accessible way of preserving organs. While ex-vivo mechanical circulatory platforms have been developed and are increasingly being utilized to prolong organ viability, cold preservation remains widely used. The authors perfused explanted hearts with oxygenated perfusion preservation devices at subnormothermic temperatures (20-23C) which is even much lower than routinely used in clinical cardiopulmonary bypass scenarios (28-32C) (in the discussion, the authors allude to SNC80's possible "protective effect" in cardiac bypass). It is unclear how much of the hypometabolic state is related to WB3 administration versus hypothermia. The study will benefit from a comparison of WB3 administration and hypothermia in Xenopus, explanted porcine organs versus cold preservation alone to show distinction in biostasis parameters.

      Indeed, we expect that both pharmaceutical interventions and cooling could contribute to a hypometabolic state. To assess this, the controls and the treated groups were exposed to the same temperatures for both the Xenopus (18C) and porcine heart experiments (20-23C). Therefore, we can conclude that any changes in the treatment group relative to control can be attributed to the introduction of SNC80 or WB3 and not from cooling alone.  

      (2) The authors selected SNC80 based on a literature survey where it was identified based on its ability to induce hypothermia and protect against the effects of spinal cord ischemia in rodents. While this makes sense, were other drugs (eg. Puerarin) considered? The induction of hypothermia and spinal cord protective effect of SNC80 may be multifactorial and not necessarily related to its biostatic effects as the authors describe. Please provide some more context into the background of SNC80.

      During our research program, we considered and tested other drugs (>100 existing compounds in Xenopus screens). Although the published hypothermic and tissue protective effects suggested to us that SNC80 should be included in screening, it was not until we observed effects across multiple test parameters, systems, and species that we honed in on SNC80 as a lead compound. We have added additional information to further clarify the background of SNC80 on pgs. 3-4. 

      (3) In most of the models, the primary metric that the authors utilize to characterize metabolic activity is oxygen consumption, which is a somewhat limited indicator. For instance, this does not provide any information, however, on anaerobic metabolic activity. In addition, the ATP/ADP ratio was found to decrease in the organ chips where SNC80 was utilized, but similar findings were not presented for the other models. 

      We thank reviewers for their important point. We have therefore added additional experiments, including the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC) used in the Organ Chip systems. We have added a description and an interpretation of the results in the section, Stasis induction in cultured human cells and tissues and mention the role of glycolysis and cytosolic reductive carboxylation as compensatory mechanisms.  Although the ATP/ADP ratio gave us useful insight into Huh-7 cells and chips metabolic activity, this method requires transfection and live imaging which does not suit other models such as Xenopus, or whole organs. Additionally, in animal models there may be other confounding factors that might influence ATP/ADP.

      (4) The authors should provide a more detailed explanation of SNC80's mechanisms of interaction with proteins related to transmembrane transport, mitochondrial activity, and metabolic processes. What is the impact of SNC80 on mitochondrial function, particularly ATP production and mitochondrial respiration? Are there changes in mitochondrial membrane potential, electron transport chain activity, or oxidative phosphorylation? In this context, the authors discuss the potential role of NCX1 as a binding target for SNC80 and its various mechanisms in slowing metabolism. However, no experiments have been done to confirm this binding in the present study. Coimmunoprecipitation studies using appropriate antibodies against SNC80 and NCX1 should be considered to demonstrate their direct binding. Additionally, surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) experiments could be employed to quantify the binding affinity between SNC80 and NCX1, providing further evidence of their interaction. These experiments would elucidate the binding mechanism between SNC80 and NCX1 and reveal more information on the mechanism of action for SNC80. 

      We agree that further definition of the mechanism of action is an important next step for this work; however, it is far beyond the scope of the present study.

      (5) The manuscript notes that histological analysis was conducted, but it seems that only example images are provided, such as Figure 4f. Quantified histological data would provide a more thorough understanding of tissue integrity. 

      We have added quantified histological data to the manuscript that was performed by a clinician blinded to the groups and interventions (Figure 4f).

      (6) Some of the points mentioned in the discussion and conclusion are rather strong and based on possible associations such as SNC80's potential vasodilatory capacity conferring a cardioprotective effect, and ability to reversibly suppress metabolism across different temperatures and species. Please tone this down and stay limited to the organs studied. Further, the reversibility of the findings may be more objectively assessed by biomarkers with decreased immunofluorescence in response to ischemia such as troponin I for the heart and albumin for the liver. Additionally, an investigation of proteins involved in inflammation, hypoxia, and key cell death pathways using immunohistochemistry analysis can better describe the impact of treatment on apoptosis/necroptosis. 

      We have revised aspects of the Discussion and Conclusion to focus on the organs studied in the present work (pgs. 14-17). We agree that markers of inflammation, hypoxia, and cell death are critical for assessing tissue health post-treatment. We performed PCR to assess such markers (Figure 4e) and found reductions in inflammatory cytokine and injury biomarker levels. Although we agree that immunohistochemistry may be useful, such as for looking at any spatial patterns of injury, PCR offers broader dynamic range and higher sensitivity and therefore was chosen for this assay.

      (7) What could be the underlying cause of the observed increase in intercellular spacing after SNC80 administration in porcine limbs which also seems to be evident in the heart histology samples? This seems to be more prominent in the SNC80 compared to the vehicle group. 

      Since the muscle bundle areas of baseline and treated tissues were essentially the same, the increase in intracellular space in the SNC80-treated tissue suggests a compensatory reduction in muscle fiber diameter.  Intracellular metabolite concentrations have been shown to be quite stable over a large range of metabolic activities (Hochachka et al. 1998). As such, a reduction in metabolic activity induced by SNC80 may suggest reduction in the accumulation of intracellular metabolites. In order to maintain a stable intracellular metabolite concentration, water would have to be expelled accounting for the increased intracellular space.

      P W Hochachka, G B McClelland, G P Burness, J F Staples, R K Suarez Comp Biochem Physiol B Biochem Mol Biol 120, 17–26 (1998).

      (8) In the Discussion section, it would be valuable to provide a concise interpretation of the lipidomic data, particularly explaining how changes in acylcarnitine and cholesterol ester levels may relate to tadpole metabolism, hibernation, or other biological processes. 

      An interpretation of the lipidomics data has been summarized in the Discussion (pg. 14).

      (9) What are the limitations or disadvantages of the study? Does SNC80 possess any immunomodulatory properties that might affect the outcomes of organ transplantation? Are there specific organs for which SNC80 may not be a suitable preservation agent, and if so, what are the reasons behind this? 

      This study is limited in two ways. The first is that we characterized the function of the donor pig heart outside of the body, and therefore future work will be required to verify the function and quality of the hearts after they have been transplanted. Secondly, SNC80 is not currently approved for use in clinical settings and during earlier pre-clinical trials of the drug, side effects including seizures were noted and its development was halted. It is hypothesized that these seizures are related to SNC80’s delta opioid activity, so we developed a new, non-opioid analog called WB3, which will be used in future work. We have added a description of the prior seizure findings to the text (pg. 5).

      Based on assessment of tissue biomarkers by PCR, it seems that SNC80 does exhibit immunomodulating properties. Because organ transplant recipients are treated with strong immunosuppressants to prevent organ rejection, we anticipate that SNC80 would either further support this goal, have little additional effect, or reduce the amount of additional immunosuppressive drugs that would need to be administered. To date, our data does not suggest that there are specific organs for which SNC80 may not be a suitable preservation agent.     

      Reviewer #2:

      (1) The authors developed an analog of a known delta opioid receptor activator SNC80 with three orders of magnitude lesser binding with the delta opioid receptor WB3. This will likely reduce the undesirable effects of SNC80 while preserving the metabolic slowing needed for organ preservation. Yet, most experiments were done with SNC80, not the superior modification, WB3, shown in only a limited set of experiments, Figure 3.  

      We included the WB3 studies in Xenopus to confirm that the biostatic activity is not mediated through the delta opioid receptor. We have only performed a limited number of experiments with WB3 because we are focused on improving its solubility so that it can be easily dissolved in common organ perfusates without DMSO, which we were able to use in the Xenopus experiments. 

      (2) The heart is one of the most challenging organs to preserve, and some experiments are done to establish the metabolic effects of SNC80. However, the biodistribution study, shown in Figure 2, conspicuously omitted the heart. 

      Thank you for this suggestion. We returned to the biodistribution study dataset and were able to measure uptake by the heart at the 1-hour time point. We observe an increase in uptake above levels observed for other tissues at 1 hour and at levels similar to the skeletal muscle at 2 hours (plot below). Unfortunately, the heart was not visible in a sufficient number of Xenopus tissue sections to reevaluate uptake at the 2-hour time point. We were also able to re-evaluate the lipidomics data for the heart. Acylcarnitine and cholesterol ester were not significantly different between vehicle and SNC80-treated groups. The lack of change in acylcarnitine is particularly important since its upregulation has been shown to be a marker for cardiovascular disease in humans (Deda et al. 2022). The expanded lipidomics data have been added to Figure 2.

      Deda O, Panteris E, Meikopoulos T, Begou O, Mouskeftara T, Karagiannidis E, Papazoglou AS, Sianos G, Theodoridis G, Gika H. Correlation of serum acylcarnitines with clinical presentation and severity of coronary artery disease. Biomolecules. 2022 Feb 23;12(3):354.

      Author response image 1.

      (3) I do not understand the design of the electrophysiology and contractility experiments with the porcine hearts. How did you defibrillate the hearts after removal and establishing perfusion? Lines 173-175 on Page 7 state: "After defibrillation with epinephrine, the P and QRS waveforms were visible in ECGs from 3 of 4 SNC80-treated hearts (Table S1), suggesting that those hearts regain atrial and ventricular polarization." Please clarify.

      Defibrillation is done with an electric shock. Also, please show the ECG recordings to support your conclusions about "polarization." What did you mean by "polarization"? Depolarization? Repolarization? Or resting potential. To establish a normal physiological state, please show ECG waveforms and present data on basic ECG characteristics: heart rate, PQ and QT intervals, and P and QRS durations. I recommend perfusion of the porcine heart with WB3, not only SNC80.  

      Hearts were defibrillated by the application of a 10 to 30 Joule electrical shock delivered from internal paddles positioned at the right atrium (negative) across to the left ventricle (positive). Once rhythm was established, 0.5 ml of 1:1000 epinephrine was administered via the aortic inflow. Electrocardiogram (ECG) showed that both vehicle and SNC80-treated hearts exhibited irregular contractions after perfusate flush and during rewarming prior to defibrillation. After defibrillation (10-30 J electrical shock) followed by epinephrine, a regular heartbeat was established in 3 of 4 SNC80-treated hearts, exhibiting normal P and QRS waveforms (Table S1). That observation suggested that the intrinsic atrial and ventricular muscle fiber contractility was preserved, and the overall conduction system of the heart was viable. The pulse rates of SNC80-treated hearts were at or near normal for porcine hearts (70-120 beats/min) after defibrillation. Vehicle-treated hearts exhibited tachycardia following defibrillation, with all exhibiting pulse rates above the normal range for porcine hearts. We have added clarifying text and definitions (pg. 8). We have only performed a limited number of experiments with WB3 because we are focused on improving its solubility so that it can be easily dissolved in common organ perfusates without DMSO, which we were able to use in the Xenopus experiments.

      (4) Pathology data also raises concerns. The histology images shown in Figure 4f are not quantified, and they show apparently higher levels of tissue disruption in SNC80-treated tissue vs vehicle-treated. The test (lines 169-171) confirms this concern: "In some hearts treated with SNC80, greater waviness of muscle fibers was observed, possibly indicating a state of muscle contraction."  

      The histology images shown in Figure 4f were quantified and the myocardial injury score quantification show comparable histology between the groups.

      (5) The apparent state of contracture suggests a higher degree of myocardial damage and a high intracellular calcium level in SNC80-treated hearts. 

      The authors suggested that the sodium-calcium exchanger NCX is a possible target of SNC80 and could be responsible for the "hypometabolic state." However, NCX1 is critically important in the extrusion of cytosolic Ca2+ during the diastolic phase. Failure to remove excessive calcium and restore ionic homeostasis would lead to calcium overload and heart failure. 

      The histological assessment doesn’t indicate a higher degree of myocardial damage in SNC80 treated hearts. Our data are not suggestive of high intracellular calcium buildup in SNC80treated hearts. If that were the case, we would have had challenges restoring the rhythm of the hearts on the Langendorff post-preservation, which was not observed.

      (6) I am surprised the authors did not consider using the gold standard assay for measuring mitochondrial function in cells by the Seahorse Cell Mito Stress Test. 

      Thank you for this important point. We have added data from the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC) included in the Organ Chip experiments. We have added a description and an interpretation of the results in the section Stasis induction in cultured human cells and tissues. We now mention the role of glycolysis and cytosolic reductive carboxylation as compensatory mechanisms.   

      Reviewer #3:

      (1) The authors perform a literature search to identify SNC80 as a promising hit. However, the details of the literature search, a list of other potential hits, and the criteria for identification of SNC80 are not described. The hypometabolic effect of SNC80 exposure is well-characterized in the Xenopus model. Furthermore, the authors show that SNC80 localises to the brain, but do not discuss several studies that have pointed to convulsions induced by exposure to high doses of SCN80, and whether this would be apparent in the Xenopus studies. The authors have promising data on the WB3 morpholino that retains or even improves on the hypometabolism phenotype of SCN80 while likely not retaining delta opioid activity. However, this is not functionally demonstrated. Moreover, WB3 is not used in any of the other assays and models used in the study. In the setting of cardiac transplant surgery, co-administration of SNC80 reduces metabolic activity and inflammation, although it is unclear if there is an improvement in recovery of organ function due to SCN80.

      Thank you for raising these important points. We have added details of the process to identify SNC80 (pgs. 3-4) and a discussion of the studies pointing to convulsions with high doses of SNC80 (pg. 5) (which were not observed in Xenopus studies). We have also incorporated measurements of oxygen consumption during WB3 treatment in Xenopus (Figure 3d).

      (2) The reversible induction of hypometabolic status is also demonstrated in two different organ chips. These models could identify the differential response of epithelial cells and vascular cells to drug perfusion, but the authors have mostly focused on the former. Finally, the authors identify specific targets for the hypometabolic effect of SNC80, which is a valuable resource for other screening studies and can form the basis for future work. 

      In the revised manuscript, we have also added data from the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC). We have added a description and an interpretation of the results in the section Stasis induction in cultured human cells and tissues. We highlight the differences in metabolic response from the four cell types to SNC80 treatment. It is important to note that the metabolism-suppressing effects of SNC80 were most potent in the epithelial cells that were originally derived from highly metabolic tumors (Caco-2 and Huh7) versus primary normal endothelial cells (HUVEC and LSEC), which is also consistent with past work suggesting that targeting of the NCX1 channel might offer a way to slow tumor growth (Wan et al. 2022). Because we observed more prominent effects in epithelial cells in 2D assays, we decided to focus the 3D organ chips assays on epithelial cells.

      Wan, H. et al. NCX1 coupled with TRPC1 to promote gastric cancer via Ca2+/AKT/β-catenin pathway. Oncogene (2022) doi:10.1038/s41388-022-02412-9.

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 136, "Based on these intriguing findings with human Organ Chips". No mention of human organ chips was made in the text at this point, suggest rewording.  

      Thank you for identifying this error. We have revised this line (pg. 6).

      (2) Please provide more information on previous studies that have explored other drugs for organ protection, the novelty of the findings of this study, and how the findings of this study compare to prior data. 

      Building on the background of organ preservation drugs provided in the Introduction, we have added details to compare our outcomes to other drugs explored for organ protection (pg. 15).

      (3) The dosing study in Supplemental Figure S1 provides some context on why the authors utilized the 100 uM SNC80 concentration. It would be helpful if the authors could elaborate in the Discussion on the mechanistic rationale for this concentration. 

      This dose was chosen to maximize suppression of metabolic and activity parameters, while ensuring reversibility of biostasis. We have clarified this in the Discussion (pg. 14).

      (4) In Supplement Figure S2a, the y-axis measures the relative metabolic rate. It seems from the text that this is a relative measure of oxygen consumption, but it should be clarified accordingly. 

      We have clarified this point in the Methods section.  

      (5) What is the specific time or time frame when the reversed effect of SNC80 is most pronounced or at its peak? 

      When Xenopus are moved to fresh medium after SNC80 treatment, we observe a 15-minute period during which no reversal is evident from motion measurements. After that period, we observe a gradual, linear recovery over 2 hours. We cannot designate a specific period in which the reversal effect is most pronounced from these data.

      (6) WB3 seems to show a faster and stronger impact on swimming in comparison to SNC80. What could be the potential reasons for this difference, and could this have any clinical implications? 

      From our current data, we understand the key difference to be that SNC80 has greater affinity for the delta opioid receptor compared to WB3. Therefore, we hypothesize that by not interacting with the opioid system, WB3 induces faster and stronger impacts on swimming. In mice, it has been shown that SNC80 directly inhibits forebrain GABAergic neurons via activity at their delta opioid receptors, which leads to convulsions (Chung et al. 2015). Although we do not observe seizure-like behavior in Xenopus, drugs that inhibit GABAergic neurons can produce stimulant effects in vivo. Since WB3 has a lower affinity for the delta opioid receptor, it likely produces less stimulation, leading to faster and stronger suppression of swimming behaviors. Additionally, it is possible that WB3 interacts with additional targets we have not yet identified.

      Chung PC, Boehrer A, Stephan A, Matifas A, Scherrer G, Darcq E, Befort K, Kieffer BL. Delta opioid receptors expressed in forebrain GABAergic neurons are responsible for SNC80-induced seizures. Behavioural brain research. 2015 Feb 1;278:429-34.

      (7) Elaborate on the potential significance of SNC80's distribution in the GI tract, gill region, and skeletal muscle. How might this distribution relate to the observed physiological effects? 

      In Xenopus tadpoles, we observe SNC80 uptake in the gill region and GI tract within 1 hour. The multiple possible routes of uptake in Xenopus (skin, gills, and mouth) may account for the relatively rapid physiological effects observed in our experiments. The uptake observed in the muscle may be specifically responsible for the slowed motion observed in Xenopus activity assays. This has been elaborated upon in the text (pg. 5).

      (8) Please use italics where needed, e.g., in vitro, in vivo, etc. 

      This has been updated throughout the article.

      (9) Supplemental Figure S1 - Is there any reason for having 3 replicates for the 100uM compared to the 4 replicates in the other groups? 

      Each group had 4 replicates; however, a review of the replicates for the 100 µM group suggested the presence of a leak or air bubble in one oxygen measurement vial, which, therefore, had to be excluded from the analysis.

      (10) Figure 3 description - 'c' should be bold. 

      Figure 3 has been updated.

      Reviewer #3:

      Title: The title suggests that several candidate compounds are identified but the study focuses primarily on SCN80. Please consider rephrasing to make it more specific to this molecule. Alternatively, the manuscript would be significantly strengthened if more data is provided for WB3. 

      Although the study focuses on SNC80, we introduce an entirely novel molecule, WB3, and therefore, we feel it is more appropriate to indicate that multiple molecules were studied.

      Line 58-59: please cite additional primary literature papers for the different therapeutics discussed. As an example, the authors do not cite or discuss Massen et al PMID: 31743376 which suggests that H2S is able to induce similar hypometabolic effects even at 37C. 

      Thank you for this suggestion. We have expanded our discussion of primary literature paper for the therapeutics discussed (pg. 15).

      Line 76 - 77: The authors do not provide any data on the other possible hits from their literature search or methods details on how this was done. No relevant literature has been cited. What criteria were used to finalise SNC80? 

      During our research program, we considered and tested other drugs (>100 existing compounds in Xenopus screens). Although the published hypothermic and tissue-protective effects suggested that SNC80 should be included in screening, it was not until we observed effects across multiple test parameters, systems, and species that we honed in on SNC80 as a lead compound. We have added additional information to further clarify the background of SNC80 on pgs. 3-4.  

      Line 85 and Lines 342-345 in the Discussion: SNC80 is reported to induce convulsions at high doses in rodents and primates - was this also evident in the Xenopus studies? How does the dose used in the Xenopus studies compare with the high dose (ca. 10 mg/kg) used in primate studies Danielson et al., PMID: 17112570? 

      We did not observe convulsions in SNC80-treated Xenopus. However, we have updated the manuscript to include previous observations of convulsions in rodents and primates treated with SNC80 (pg. 5). Due to a number of differences, it is challenging to directly compare the dosing in Xenopus studies to those in the primate. In the present study, groups of 10 Xenopus were exposed to a 10 mL pool of 100 µM SNC80, which may be absorbed via oral, gill, and skin routes. Primates were dosed with 10 mg/kg delivered intramuscularly. Because these models may result in different drug biodistributions, any direct comparisons would be speculative. Further work in rodent models may help clarify the relevant dosing differences.

      Line 117: what does 'double the concentration' mean? Is this with reference to the dose of SNC80? If so, is this sufficient to completely block opioid receptor activity? 

      Yes, we meant that naltrindole was dosed at double the concentration of SNC80. We have clarified this in the text (pg. 5). Prior work in rodent brain tissue has shown that radiolabeled naltrindole binds to saturation at picomolar to nanomolar concentrations (Yamamura et al. 1992). To confirm our initial observations with naltrindole and SNC80, we also tested a SNC80 analog (WB3) with very low delta opioid activity (Figure 3), which showed similar effects.

      Yamamura MS, Horvath R, Toth G, Otvos F, Malatynska E, Knapp RJ, Porreca F, Hruby VJ, Yamamura HI.

      Characterization of [3H] naltrindole binding to delta opioid receptors in rat brain. Life sciences. 1992 Jan 1;50(16):PL119-24.

      Figure 3c, d: It appears that WB3 is even more effective at rapidly reducing motion and inducing faster recovery which is an exciting result. However, in 3d it appears that longterm exposure of 8h has detrimental effects since the heart rate remains depressed. Please clarify. 

      Yes, at 8 hours, we observe slow recovery and, in some cases, maintenance of depressed heart rates. This could be because the drug is more lipophilic and might remain in fat tissue for longer times. Although our current goal is to lengthen the time window for heart transplant surgery to 6 hours, we are working on formulating WB3 to optimize safety for longer applications (8+ hours).

      Figure 4: the experiments with the heart transplants are well done, but do not demonstrate an additional protective effect over the current standard of care except for the reduced metabolism. Could the authors discuss this further in the discussion or provide data with WB83, which may show a stronger effect? Scale bars are missing in panel f.  

      In addition to reduced metabolism, we also demonstrate reduced expression of inflammation, hypoxia, and cell death-related markers compared to machine perfusion alone (Figure 4e). The potential protective effect of the biostasis-inducing compounds will be further investigated in a planned orthotopic porcine transplant study where pigs will be followed up for 6 hours post weaning off a bypass machine allowing enough time to assess potential benefit of biostasisinducing drugs. Additionally, we have added scale bars (Figure 4f).

      Order of manuscript: Line 136 already refers to the organ-chip data, which is only presented at the end. Please edit. I feel the manuscript would flow better with the organchip data presented before the heart transplant data. 

      Organ-chip data: this is an important component of the story but is only shown in supplementary figures. Consider showing this data in the main figures, as eLife has no space restrictions. Furthermore, it is unclear if the effluent collected and analysed is from apical or vascular, or both. In any case, the analysis via microscopy-based methods appears restricted to the epithelium. The manuscript would be significantly strengthened by providing some data on the effect of SNC80 on vascular cells. 

      As requested, we have moved the Organ Chips results to a main figure (new Fig. 5). We have added additional experiments, including the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC). We have added a description and an interpretation of the results in the section Stasis induction in cultured human cells and tissues. The 2D assays showed that metabolism-suppressing effects of SNC80 were most potent in the epithelial cells that were originally derived from highly metabolic tumors (Caco-2 and Huh7) versus endothelial cells (HUVEC and LSEC). Based on these results, we decided to focus the 3D organ chips assays on epithelial cells only, and hence only analyzed effluents from the epithelial (apical) channel.

      Methods section for fabrication of oxygen sensors: Please refer to prior papers from your lab (Grant et al., PMID: 35274118) with regards to details of the fabrication of the devices with inbuilt oxygen sensors. 

      The methods used for the fabrication of oxygen sensors will be included in a separate manuscript currently in preparation.  

      Figure S3 and Line 243-244: Please provide the data for untreated control organ chips in panels d and e a mean value for which is quoted in the main text. The images in panel f are too small for the reader to appreciate the point, please provide zooms. Scalebars are also missing from these images. Please increase the number of replicates for S3f - the liver-chip data has only two replicates which has very low power for statistical testing. In general, the number of organ chips used for the data for each panel is missing. 

      As mentioned in the captions, Figure S3 (now Figure S5) panels d and e show average albumin production of Liver Chips at day 7-10 of culture. These measurements were performed before any treatment with SNC80 to characterize the chip’s functional metabolism. In panel g, although we only show biological N=2-3, each datapoint corresponds to an average of multiple fields of view (multiple technical replicates). We have now clarified this in the figure legend.

      Figure S4 - I do not quite understand why the perfusion with the vehicle only also affects oxygen release in the liver chip. Is it possible to use a different vehicle? 

      The liver and gut oxygen levels are not on the same y-axis (gut on the left and liver on the right). The oxygen fold change of the liver control chip is below 0.5, which is in the same range as the gut control chip (0 +/- 0.25). There is a natural variation in oxygen consumption over the lifetime of the chips (now Figure 5c), and untreated cells are metabolically active and consuming oxygen. The small drop observed suggests that liver chips may not have reached a stable oxygen consumption rate at the time of the experiment, whereas the gut chips have stabilized.  

      Figure S5c-f: The units on the Y-axis are missing. 

      Panels S5c-d (now Figure S6c-d) depict the percent cytotoxicity and are thus unitless. Panels S5e-h (now Figure S6e-h) show the effluent levels relative to baseline and are also unitless. We have updated the figure caption to clarify this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Critical synopsis of the articles cited by referee 2:

      (1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      (2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli  chassis which uses Na+ instead of H+ for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      (3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K+ compared with 0.0000001 M of H+ in E. coli, so K+ is arguably a million times more important for the membrane potential than H+ and thus the electrophysiology!

      Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H+. This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K+) around!

      In our model Figure 4A is better explained by depolarisation due to K+ channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      (4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K+. The manuscript is incorrect as a result and I would not recommend publication.

      In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees

      Reviewer #1 (Public Review):

      Summary:

      Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:

      - The authors report original data.

      - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.

      - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.

      - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.

      - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:

      - Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      - Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      >>We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      - The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      - The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      - Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2 (Public Review):

      Summary of what the authors were trying to achieve:

      The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:

      The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+ channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      >>It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms  (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gK*n^4 for potassium, gNa*m^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      >>ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3 (Public Review):

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      (1) An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      (2) The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      (3) It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      (4) The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      (5) Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      >>Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential  dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      (6) Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C). 

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      (1) In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      (2) In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      (3) The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Scientific recommendations:

      - Although Fig 4A clearly shows that light stimulation has an influence on the dynamics of cell membrane potential in the biofilm, it is important to rule out the contribution of variations in environmental parameters. I understand that for technical reasons, the flow of fresh medium must be stopped during image acquisition. Therefore, I suggest performing control experiments, where the flow is stopped before image acquisition (15min, 30min, 45min, and 1h before). If there is no significant contribution from environmental variations (pH, RedOx), the dynamics of the electrical response should be superimposed whatever the delay between stopping the flow stop and switching on the light.

      In this current research study, we were focused on studying how E. coli cells and biofilms react to blue light stress via their membrane potential dynamics. This involved growing the cells and biofilms, stopping the media flow and obtaining data immediately. We believe that stopping the flow not only helped us to manage data acquisition, it also helped us reduce the effect of environmental factors. In our future study we will expand the work to include how the membrane potential dynamics evolve in the presence of changing environmental factors for example such induced by stopping the flow at varied times.

      - Since TMRM signal exhibits a linear increase after the first response peak (Supplementary Figure 1D), I recommend mitigating the statement at line 78.

      - To improve the spatial analysis of the electrical response, I suggest plotting kymographs of the intensity profiles across the biofilm. I have plotted this kymograph for Video S3 and it appears that there is no electrical propagation for the second peak. In addition, the authors should provide technical details of how R^2(t) is measured in the first regime (Figure 7E).

      See the dedicated simulation article for more details. https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Line 152: To assess the variability of the latency, the authors should consider measuring the variance divided by the mean instead of SD, which may depend on the average value.

      We are happy with our current use of standard error on the standard deviation. It shows what we claim to be true.

      - Line 154-155: To truly determine whether the amplitude of the "action potential" is independent of biofilm size, the authors should not normalise the signals.

      Good point. We qualitatively compared both normalized and unnormalized data. Recent electrical impedance spectroscopy measurements (unpublished) indicate that the electrical activity is an extensive quantity i.e. it scales with the size of the biofilms.

      - To precise the role of K+ in the habituation response, I suggest using valinomycin at sub-inhibitory concentrations (10µM). Besides, the high concentration of CCCP used in this study completely inhibits cell activity. Not surprisingly, no electrical response to light stimulation was observed in the presence of CCCP. Finally, the Kch complementation experiment exhibits a "drop after the first peak" on a single point. It would be more convincing to increase the temporal resolution (1min->10s) to show that there is indeed a first and a second peak.

      An interesting experiment for the future.

      - Line 237-238: There are only two points suggesting that the dynamics of hyperpolarization are faster at higher irradiance(Fig 4A). The authors should consider adding a third intermediate point at 17µW/mm^2 to confirm the statement made in this sentence.

      Multiple repeats were performed. We are confident of the robustness of our data.

      - Line 249 + Fig 4E: It seems that the data reported on Fig 4E are extracted from Fig 4D. If this is indeed the case, the data should be normalised by the total population size to compare survival probabilities under the two conditions. It would also be great to measure these probabilities (for WT and ∆kch) in the presence of ROS scavengers.

      - To distinguish between model fitting and model predictions, the authors should clearly state which parameters are taken from the literature and which parameters are adjusted to fit the experimental data.

      - Supplementary Figure 4A: why can't we see any wavefront in this series of images?

      For the experimental data, the wavefront was analyzed by employing the imaris software. We systematically created a ROI with a curved geometry within the confocal stack (the biofilm). The fluorescence of ThT was traced along the surface of the curved geometry was analyzed along the z-axis.

      - Fig 7B: Could the authors explain why the plateau is higher in the simulations than in the biofilm experiments? Could they add noise on the firing activities?

      See the dedicated Martorelli modelling article. In general we would need to approach stochastic Hodgkin-Huxley modelling and the fluorescence data (and electrical impedance spectroscopy data) presented does not have extensive noise (due to collective averaging over many bacteria cells).

      - Supplementary Figure 4B: Why can't we see the second peak in confocal images?

      The second peak is present although not as robust as in Fig 2B. The confocal images were obtained with a laser source. Therefore we tried to create a balance between applying sufficient light stress on the bacterial cells and mitigating photobleaching.

      Editing recommendations:

      The editing recommendations below has been applied where appropriate

      - Many important technical details are missing (e.g. R^2, curvature, and 445nm irradiance measurements). Error bars are missing from most graphs. The captions should clearly indicate if these are single-cell or biofilm experiments, strain name, illumination conditions, number of experiments, SD, or SE. Please indicate on all panels of all figures in the main text and in the supplements, which are the conditions: single cell vs. biofilm, strains, medium, centrifugal vs centripetal etc..., where relevant. Please also draw error bars everywhere.

      We have now made appropriate changes. We specifically use cells when we were dealing with single cells and biofilms when we worked on biofilms. We decided to describe the strain name either on the panel or the image description.

      - Line 47-51: The way the paragraph is written suggests that no coordinated electrical oscillations have been observed in Gram-negative biofilms. However, Hennes et al (referenced as 57 in this manuscript) have shown that a wave of hyperpolarized cells propagates in Neisseria gonorrhoea colony, which is a Gram-negative bacterium.

      We are now aware of this work. It was not published when we first submitted our work and the authors claim the waves of activity are due to ROS diffusion NOT propagating waves of ions (coordinated electrical wavefronts).

      - Line 59: "stressor" -> "stress" or "perturbation".

      The correction has been made.

      - Line 153: Please indicate in the Material&Methods how the size of the biofilm is measured.

      The biofilm size was obtained using BiofilmQ and the step by step guide for using BiofilmQ were stated..

      - Figure 2A: Please provide associated brightfield images to locate bacteria.

      - Line 186: Please remove "wavefront" from the caption. Fig2B only shows the average signal as a function of time.

      This correction has been implemented.

      - Fig 3B,C: Please indicate single cell and biofilm on the panels and also WT and ∆kch.

      - Line 289: I suggest adding "in single cell experiments" to the title of this section.

      - Fig 5A: blue light is always present at regular time intervals during regime I and II. The presence of blue light only in regime I could be misleading.

      - Fig 5C: The curve in Fig 5D seems to correspond to the biofilm case. The curve given by the model, should be compared with the average curve presented in Fig 1D.

      - Fig 6A, B, and C: These figures could be moved to supplements.

      - Line 392: Replace "turgidity" with "turgor pressure".

      - Fig 7C,E: Please use a log-log scale to represent these data and indicate the line of slope 1.

      - Fig 7E: The x-axis has been cropped.

      - Please provide a supplementary movie for the data presented in Fig 7E.

      - Line 455: E. Coli biofilms do not express ThT.

      - Line 466: "\gamma is the anomalous exponent". Please remove anomalous (\gamma can equal 1 at this stage).

      - Line 475: Please replace "section" with "projection".

      - Line 476: Please replace "spatiotemporal" with "temporal". There is no spatial dependency in either figure.

      - Line 500: Please define Eikonal approximation.

      - Fig 8 could be moved to supplements.

      - Line 553: "predicted" -> "predict".

      - Line 593: Could the authors explain why their model offers much better quantitative agreement?

      - Line 669: What does "universal" mean in that context?

      - Line 671: A volume can be pipetted but not a concentration.

      - Line 676: Are triplicates technical or biological replicates?

      - Sup Fig1: Please use minutes instead of seconds in panel A.

      - Model for membrane dynamics: "The fraction of time the Q+ channel is open" -> "The dynamics of Q+ channel activity can be written". Ditto for K+ channel...

      - Model for membrane dynamics: "the term ... is a threshold-linear". This function is not linear at all. Why is it called linear? Also, please describe what \sigma is.

      - ABFDF model: "releasing a given concentration" -> "releasing a local concentration" or "a given number" but it's not \sigma anymore. Besides, this \sigma is unlikely related to the previous \sigma used in the model of membrane potential dynamics in single cells. Please consider renaming one or the other. Also, ions are referred to as C+ in the text and C in equation 8. Am I missing something?

      Reviewer #2 (Recommendations For The Authors):

      I have included all my comments as one review. I have done so, despite the fact that some minor comments could have gone into this section, because I decided to review each Result section. I thus felt that not writing it as one review might be harder to follow. I have however highlighted which comments are minor suggestions or where I felt corrections.

      However, while I am happy with all my comments being public, given their nature I think they should be shown to authors first. Perhaps the authors want to go over them and think about it before deciding if they are happy for their manuscript to be published along with these comments, or not. I will highlight this in an email to the editor. I question whether in this case, given that I am raising major issues, publishing both the manuscript and the comments is the way to go as I think it might just generate confusion among the audience.

      Reviewer #3 (Recommendations For The Authors):

      I was unable to find any legends for any of the supplemental videos in my review materials, and I could not open supplemental video 5.

      I made some comments in the public review about the analysis and interpretation of the time-to-fire data. One of the other challenges in this data set is that the time resolution is limited- it seems that a large proportion of cells have already fired after a single acquisition frame. It would be ideal to increase the time resolution on this measurement to improve precision. This could be done by imaging more quickly, but that would perhaps necessitate more blue light exposure; an alternative is to do this experiment under lower blue light irradiance where the first spike time is increased (Figure 4A).

      In the public review, I mentioned the possible impact of high membrane potential on PI permeability. To address this, the experiment could be repeated with other stains, or the viability of blue light-treated cells could be addressed more directly by outgrowth or colony-forming unit assays.

      In the public review, I mentioned the possible combined toxicity of ThT and blue light. Live/dead experiments after blue light exposure with and without ThT could be used to test for such effects, and/or the growth curve experiment in Figure 1F could be repeated with blue light exposure at a comparable irradiance used in the experiment.

      Throughout the paper and figure legends, it would help to have more methodological details in the main text, especially those that are critical for the interpretation of the experiment. The experimental details in the methods section are nicely described, but the data analysis section should be expanded significantly.

      At the end of the results section, the authors suggest a critical biofilm size of only 4 µm for wavefront propagation (not much larger than a single cell!). The authors show responses for various biofilm sizes in Fig. 2C, but these are all substantially larger. Are there data for cell clusters above and below this size that could support this claim more directly?

      The authors mention image registration as part of their analysis pipeline, but the 3D data sets in Video S6B and Fig. S4A do not appear to be registered- were these registered prior to the velocity analysis reported in Fig. 8?

      One of the most challenging claims to demonstrate in this paper is that these membrane potential wavefronts are involved in coordinating a large, biofilm-scale response to blue light. One possible way to test this might be to repeat the Live/Dead experiment in planktonic culture or the single-cell condition. If the protection from blue light specifically emerges due to coordinated activity of the biofilm, the Kch mutant would not be expected to show a change in Live/Dead staining in non-biofilm conditions.

      Line 140: How is "mature biofilm" defined? Also on this same line, what does "spontaneous" mean here?

      Line 151: "much smaller": Given that the reported time for 3D biofilms is 2.73 {plus minus} 0.85 min and in microclusters is 3.27 {plus minus} 1.77 min, this seems overly strong.

      Line 155: How is "biofilm density" characterized? Additionally, the data in Figure 2C are presented in distance units (µm), but the text refers to "areal coverage"- please define the meaning of these distance units in the legend and/or here in the text (is this the average radius?).

      Lines 161-162: These claims seem strong given the data presented before, and the logic is not very explicit. For example, in the second sentence, the idea that this signaling is used to "coordinate long-range responses to light stress" does not seem strongly evidenced at this point in the paper. What is meant by a long-range response to light stress- are there processes to respond to light that occur at long-length scales (rather than on the single-cell scale)? If so, is there evidence that these membrane potential changes could induce these responses? Please clarify the logic behind these conclusions.

      Lines 235-236: In the lower irradiance conditions, the responses are slower overall, and it looks like the ThT intensity is beginning to rise at the end of the measurement. Could a more prominent second peak be observed in these cases if the measurement time was extended?

      Line 242-243: The overall trajectories of extracellular potassium are indeed similar, but the kinetics of the second peak of potassium are different than those observed by ThT (it rises some minutes earlier)- is this consistent with the idea that Kch is responsible for that peak? Additionally, the potassium dynamics also reflect the first peak- is this surprising given that the Kch channel has no effect on this peak?

      Line 255-256: Again, this seems like a very strong claim. There are several possible interpretations of the catalase experiment (which should be discussed); this experiment perhaps suggests that ROS impacts membrane potential, but does not obviously indicate that these membrane potential fluctuations mitigate ROS levels or help the cells respond to ROS stress. The loss of viability in the ∆kch mutant might indicate a link between these membrane potential experiments and viability, but it is hard to interpret without the no-light control I mention in the public review.

      Lines 313-315: "The model predicts... the external light stress". Please clarify this section. Where this prediction arises from in the modeling work? Second, I am not sure what is meant by "modulates the light stress" or "keeps the cell dynamics robust to the intensity of external light stress" (especially since the dynamics clearly vary with irradiance, as seen in Figure 4A).

      Line 322: I am not sure what "handles the ROS by adjusting the profile of the membrane potential dynamics" means. What is meant by "handling" ROS? Is the hypothesis that membrane potential dynamics themselves are protective against ROS, or that they induce a ROS-protective response downstream, or something else? Later in lines 327-8 the authors write that changes in the response to ROS in the model agree with the hypothesis, but just showing that ROS impacts the membrane potential does not seem to demonstrate that this has a protective effect against ROS.

      Line 365-366: This section title seems confusing- mechanosensitive ion channels totally ablate membrane potential dynamics, they don't have a specific effect on the first hyperpolarization event. The claim that mechanonsensitive ion channels are specifically involved in the first event also appears in the abstract.

      Also, the apparent membrane potential is much lower even at the start of the experiment in these mutants- is this expected? This seems to imply that these ion channels also have a blue light independent effect.

      Lines 368, 371: Should be VGCCs rather than VGGCs.

      Line 477: I believe the figure reference here should be to Figure 7B, not 6B.

      Line 567-568: "The initial spike is key to registering the presence of the light stress." What is the evidence for this claim?

      Line 592-594: "We have presented much better quantitative agreement..." This is a strong claim; it is not immediately evident to me that the agreement between model and prediction is "much better" in this work than in the cited work. The model in Figure 4 of reference 57 seems to capture the key features of their data. Clarification is needed about this claim.

      Line 613: "...strains did not have any additional mutations." This seems to imply that whole genome sequencing was performed- is this the case?

      Line 627: I believe this should refer to Figure S2A-B rather than S1.

      Line 719: What percentage of cells did not hyperpolarize in these experiments?

      Lines 751-754: As I mentioned above, significant detail is missing here about how these measurements were made. How is "radius" defined in 3D biofilms like the one shown in Video S6B, which looks very flat? What is meant by the distance from the substrate to the core, since usually in this biofilm geometry, the core is directly on the substrate? Most importantly, this only describes the process of sectioning the data- how were these sections used to compute the velocity of ThT signal propagation?

      I also have some comments specifically on the figure presentation:

      Normalization from 0 to 1 has been done in some of the ThT traces in the paper, but not all. The claims in the paper would be easiest to evaluate if the non-normalized data were shown- this is important for the interpretation of some of the claims.

      Some indication of standard deviation (error bars or shading) should be added to all figures where mean traces are plotted.

      Throughout the paper, I am a bit confused by the time axis; the data consistently starts at 1 minute. This is not intuitive to me, because it seems that the blue light being applied to the cells is also the excitation laser for ThT- in that case, shouldn't the first imaging frame be at time 0 (when the blue light is first applied)? Or is there an additional exposure of blue light 1 minute before imaging starts? This is consequential because it impacts the measured time to the first spike. (Additionally, all of the video time stamps start at 0).

      Please increase the size of the scale bars and bar labels throughout, especially in Figure 2A and S4A.

      In Figure 1B and D, it would help to decrease the opacity on the individual traces so that more of them can be discerned. It would also improve clarity to have data from the different experiments shown with different colored lines, so that variability between experiments can be clearly visualized.

      Results in Figure 1E would be easier to interpret if the frequency were normalized to total N. It is hard to tell from this graph whether the edges and bin widths are the same between the data sets, but if not, they should be. Also, it would help to reduce the opacity of the sparse cell data set so that the full microcluster data set can be seen as well.

      Biofilm images are shown in Figures 2A, S3A, and Video S3- these are all of the same biofilm. Why not take the opportunity to show different experimental replicates in these different figures? The same goes for Figure S4A and Video S6B, which again are of the same biofilm.

      Figure 2C would be much easier to read if the curves were colored in order of their size; the same is true for Figure 4A and irradiance.

      The complementation data in Figure S3D should be moved to the main text figure 3 alongside the data about the corresponding knockout to make it easier to compare the curves.

      Fig.ure S3E: Is the Y-axis in this graph mislabeled? It is labeled as ThT fluorescence, but it seems that it is reporting fluorescence from the calcium indicator?

      Video S6B is very confusing- why does the video play first forwards and then backwards? Unless I am looking very carefully at the time stamps it is easy to misinterpret this as a rise in the intensity at the end of the experiment. Without a video legend, it's hard to understand this, but I think it would be much more straightforward to interpret if it only played forward. (Also, why is this video labeled 6B when there is no video 6A?)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Plasmacytoid dendritic cells (pDCs) represent a specialized subset of dendritic cells (DCs) known for their role in producing type I interferons (IFN-I) in response to viral infections. It was believed that pDCs originated from common DC progenitors (CDP). However, recent studies by Rodrigues et al. (Nature Immunology, 2018) and Dress et al. (Nature Immunology, 2019) have challenged this perspective, proposing that pDCs predominantly develop from lymphoid progenitors expressing IL-7R and Ly6D. A minor subset of pDCs arising from CDP has also been identified as functionally distinct, exhibiting reduced IFN-I production but a strong capability to activate T-cell responses. On the other hand, clonal lineage tracing experiments, as recently reported by Feng et al. (Immunity, 2022), have demonstrated a shared origin between pDCs and conventional DCs (cDCs), suggesting a contribution of common DC precursors to the pDC lineage.

      In this context, Araujo et al. investigated the heterogeneity of pDCs in terms of both development and function. Their findings revealed that approximately 20% of pDCs originate from lymphoid progenitors common to B cells. Using Mb1-Cre x Bcl11a floxed mice, the authors demonstrated that the development of this subset of pDCs, referred to as "B-pDCs," relied on the transcription factor BCL11a. Functionally, B-pDCs exhibited a diminished capacity to produce IFN-I in response to TLR9 agonists but secreted more IL-12 compared to conventional pDCs. Moreover, B-pDCs, either spontaneously or upon activation, exhibited increased expression of activation markers (CD80/CD86/MHC-II) and a heightened ability to activate T-cell responses in vitro compared to conventional pDCs. Finally, Araujo et al. characterized these B-pDCs at the transcriptomic level using bulk and single-cell RNA sequencing, revealing them as a unique subset of pDCs expressing certain B cell markers such as Mb1, as well as specific markers (Axl) associated with cells recently described as transitional DCs.

      Thus, in contrast to previous findings, this study posits that a small proportion of pDCs derive from B cell-committed lymphoid progenitors, and this subset of B-pDCs exhibits distinct functional characteristics, being less specialized in IFN-I production but rather in T cell activation.

      Strengths:

      Previously, the same research group delineated the significance of BCL11a as a critical transcription factor in pDC development (Ippolito et al., PNAS, 2014). This study elucidates the precise stage during hematopoiesis at which BCL11a expression becomes essential for the emergence of a distinct subset of pDCs, substantiated by robust genetic evidence in vivo. Furthermore, it underscores the shared developmental origin between pDCs and B cells, reinforcing prior research in the field that suggests a lymphoid origin of pDCs. Finally, this work attributes specific functional properties to pDCs originating from these lymphoid progenitors shared with B cells, emphasizing the early imprinting of functional heterogeneity during their development.

      Weaknesses:

      The authors delineate a subset of pDCs dependent on the BCL11a transcription factor, originating from lymphoid progenitors, and compare it to conventional pDCs, which they suggest differentiate from common DC progenitors of myeloid origin. However, this interpretation lacks support from the authors' data. Their single-cell RNA sequencing data identifies cells corresponding to progenitors (Prog2), from which the majority of pDCs, termed conventional pDCs, likely originate. This progenitor cell population expresses Il7r, Siglech, and Ly6D, but not Csfr1. The authors describe this progenitor as resembling a "pro-pDC myeloid precursor," yet these cells align more closely with lymphoid (Il7r+) progenitors described by Rodrigues et al. (Nature Immunology, 2018) and Dress et al. (Nature Immunology, 2019). Furthermore, analysis of their Mb1 reporter mice reveals that only a fraction of common lymphoid progenitors (CLP) express YFP, giving rise to a fraction of YFP+ pDCs. However, this does not exclude the possibility that YFP- CLP could also give rise to pDCs. The authors could address this caveat by attempting to differentiate pDCs from both YFP+ and YFP- CLPs in vitro in the presence of FLT3L. Additionally, transfer experiments using these lymphoid progenitors could be conducted in vivo to assess their differentiation potential in competitive settings.

      Dear Reviewer 1, we appreciate your thoughtful comments. We made the decision to address the Prog2 cluster as “pro-pDC myeloid precursor” because despite its lack of CSFR-1, its CIPR similarity score showed highest transcriptional similarity to the population “SC.CDP.BM” (GEO accession number: GSM791114), which is shown to be Sca1- Flt3+ cKitlo.

      A similar population identified as “common dendritic cell progenitor” is shown by Onai and colleagues (Onai et al. 2013, Immunity) to be capable of differentiating into pDCs by upregulating E2-2 and subsequently downregulating M-CSFR. In addition, we were unable to infer a developmental trajectory between Prog2 and B-pDCs using SimplePPT on Monocle3 (Figure 5B). Since we know our B-pDCs are CLP derived and most likely share a B cell progenitor population, we feel this lack of connectivity to the UMAP myeloid partition corroborates our assignment of Prog2 as a myeloid pDC progenitor (not CLP derived). Of note, recent work by Medina and colleagues has shown that while IL-7Rα knockout mice exhibit a block in B cell development at the all-lymphoid progenitor (ALP) stage, PDCA-1+ pDCs identified within the initially gated BLP population persisted (PLoS One, 2013), suggesting the IL7R chain is not required for the development of PDCA1+ cells. 

      Using their Mb1-reporter mice, the authors demonstrate that YFP pDCs originating from lymphoid progenitors are functionally distinct from conventional pDCs, mostly in vitro, but their in vivo relevance remains unknown. It is crucial to investigate how Bcl11a conditional deficiency in Mb1-expressing cells affects the anti-viral immune response, for example, using the M-CoV infection model as described by Sulczewski et al. in Nature Immunology, 2023. Particularly, the authors suggest that their B-pDCs act as antigen-presenting cells involved in T-cell activation compared to conventional pDCs. However, these findings contrast with those of Rodrigues et al., who have shown that pDCs of myeloid origin are more effective than pDCs of lymphoid origin in activating T-cell responses. The authors should discuss these discrepancies in greater detail. It is also notable that B-PDCs acquire the expression of ID2 (Figure S3A), commonly a marker of conventional/myeloid DCs. The authors could analyze in more detail the acquisition of specific myeloid features (CD11c, CX3CR1) by this B-PDCs subset and discuss how the expression of ID2 may impair classical pDC features, as ID2 is a repressor of E2-2, a master regulator of pDC fate.

      Both reviewers expressed the need to further investigate how Bcl11a conditional deficiency in Mb1-expressing cells affects anti-viral responses of B-pDCs. While the functional characterization of B-pDC in the context of infection could be highly informative, it is really outside the scope of the present study. Our discovery that B-pDCs expand robustly upon TLR-9 agonist challenges in vivo and can prime T cells in vitro efficiently, however, suggests that these cells might play an important role during viral infections or anti-cancer immunity.

      Finally, through the analysis of their single-cell RNA sequencing data, the authors show that the subset of B-pDCs they identified expresses Axl, confirmed at the protein level. Given this specific expression profile, the authors suggest that B-pDCs are related to a previously described subset of transitional DCs, which were reported to share a common developmental path with pDCs, (Sulczewski et al. in Nature Immunology, 2023). While intriguing, this observation requires further phenotypic and functional characterization to substantiate this claim.

      We agree with the reviewer’s comments. We are currently preparing a separate manuscript addressing the commonalities between human transitional DCs and murine non-conventional pDCs.

      Reviewer #2 (Public Review):

      Summary:

      The origin of plasmatoid dendritic cells and their subclasses continues to be a debated field, akin to any immune cell field that is determined through the expression of surface markers (relative to clear subclass separation based on functional biology and experimentation). In this context, in this manuscript by Araujo et al, the authors attempt to demonstrate that a subtype of pDCs comes from lymphoid origin due to the presence of some B cell gene expression markers. They nomenclature these cells as B-pDCs. Strikingly, pDCs function via expression of IFNa where as B-pDCs do not express IFNa - thereby raising the question of what are their physiological or pathophysiological properties. B-pDCs also express AXL, a marker not seen in mouse pDCs but observed in human pDCs. Overall, using a combination of gene expression profiling of immune cells isolated from mice via RNA-seq and single-cell profiling the authors propose that B-pDCs are a novel subtype of pDCs in mice that were not previously identified and characterized.

      Weaknesses:

      My two points of discussion about this manuscript are as follows.

      (1) How new are these observations that pDCs could also originate from common lymphoid progenitors. This fact has been previously outlined by many laboratories including Shigematsu et al, Immunity 2004. These studies in the manuscript can be considered new based on the single-cell profiling presented, only if the further characterization of the isolated B-pDCs is performed at the functional biology level. Overlapping gene expression profiles are often seen in developing immune cell types- especially when only evaluated at the RNA expression level- and can lead to cell type complexity (and identification of new cell types) that are not biologically and functionally relevant.

      Dear reviewer 2, we appreciate your thoughtful comments. We believe our single cell seq analysis adds new information to the studies mentioned because of our broader approach to BM profiling. By using only one marker (PDCA1+), scRNA-seq allowed us to dissect not only several subpopulations of pDCs that to our knowledge were not previously dissected in mice, but also linked the transcriptional similarity of B-pDCs to myeloid derived pDCs (and even other myeloid cell types), as well as B cells.

      (2) The authors hardly perform any experiments to interrogate the function of these B-pDCs. The discussion on this topic can be enhanced. Ideally, some biological experiments would confirm that B-pDCs are important.

      Dear reviewer 2, we appreciate your thoughtful comment and agree about the need for further functional characterization of B-pDCs (please see comments directed to reviewer 1 above).

      (1) Considering that Bcl11a conditional deficiency severely impacts the B cell lineage, there is a possibility that such an effect on B cells may indirectly influence pDC development. To address this, the authors could repeat their bone marrow transfer experiments in a competitive setting by mixing both Bcl11a WT and CKO BM cells (using congenic markers to track the origin of the BM cells) and then specifically assess whether BM cells originating from Bcl11a CKO donors have impaired pDC output.

      Dear reviewer 2, while the comment above is valid (that the reduced number of mature B cells in our Bcl11a conditional knockout might indirectly impact B-pDC development), we and many others have previously shown that lack of transcriptional regulation of E2-2 and other pDC differentiation modulators by Bcl11a  (including ID2 and MTG16) intrinsically and selectively disrupts the pDC lineage. At the current stage, we feel rederiving Bcl11a cKOs and performing bone marrow transfers (which usually take several months) only to investigate indirect effects of B cells on pDC developments is outside the scope of this publication.

      (2) As mentioned earlier, it is important to assess the potential of CLP, whether YFP- or YFP+, in their ability to give rise to pDCs both in vitro and in vivo. This is also crucial since the authors previously demonstrated that Bcl11a deficiency in all hematopoietic cells had a more drastic impact on pDC development than mb1-cre specific deficiency.

      We agree the manuscript could be strengthened by differentiation experiments. However, in our previous publication (mentioned above by the reviewer), we specifically show that although fewer overall LSK progenitors were detected in Vav-Cre+ F/F mice, both MDP and CDP progenitor populations persisted within the Flt3+ compartment in cKO mice at percentages similar to controls. MDP (Lin– Flt3+ Sca-1− CD115+ c-kithi); CDP (Lin– Flt3+ Sca-1− CD115+ c-kitlo). This data confirms that CLPs give rise to a substantial pool of pDC subpopulations. Other works have shown this as well, both in vivo and in vitro (Wang et al. Immunity 2004;  Karsunky et al, JEM 2003, etc). We therefore feel that confirming the previous observations that CLPs can give rise to pDCs is unnecessary, as our main goal in this manuscript was to describe a new pDC subpopulation that emerges primarily from CD79a+ B cell biased progenitors.

      (3) The authors show a more severe impact of Bcl11a CKO on pDC depletion in the spleen than in the BM. Is this effect specific to the spleen, or can it also be observed in lymph nodes? What is the overall impact of Bcl11a conditional deficiency on pDC distribution in tissues such as the liver and lung? These questions are important to address to understand whether the heterogeneity of pDCs is differentially affected by their localization.

      We agree heterogeneity of pDCs can be affected by their microenvironment. Although phenotyping of lymph nodes in Bcl11a cKOs would greatly add to our manuscript, the genetically altered strains required are no longer being bred in our facility and resurrecting them from frozen sperm is outside the realm of this publication.

      (4) Regarding the functional study of pDCs, as emphasized previously, it is important to assess the in vivo relevance of B-pDCs in infectious settings.

      Dear reviewer 2, we appreciate your thoughtful comment. Please see our response directed to reviewer 1 above.

      (5) The authors injected CpG-ODN into mice and analyzed pDC phenotype upon activation. It is important to note that upon activation, especially upon induction of IFN-I production in vivo, mPDCA1 expression is no longer specific to pDCs  (Blasius et al, Journal of Immunology, 2006). Therefore, to specifically characterize pDC phenotype upon activation, a differential gating strategy is required (CD11c, B220, Ly6C, and Siglec H) to ensure that bona fide pDCs are analyzed.

      We agree with the reviewer that this would be a more appropriate characterization. Regarding PDCA1 promiscuity in activated states, we are not aware of any cell types that express very high levels of B220 and PDCA1 simultaneously other than pDCs. We therefore firmly believe that our assignment is valid. Interestingly, gating B220+ cells of Cpg challenged mice that show intermediate expression of PDCA1 results in an increase in the frequency of CD19+ B cells, which we were careful to avoid by gating only the cells that most strongly express PDCA1.

      (6) How does pDC activation regulate their mb1 expression? Could conventional pDCs, upon activation, become B-PDCs? Could activation and induction of IFN-I production in vivo also affect CLP and increase the amount of YFP+ lymphoid progenitors and thus B-pDC output?

      Dear reviewer, we agree with your concern, albeit beyond the scope of the present study. While changes in YFP MFI via flow cytometry upon vaccination was not substantial, we have included the following comment in the manuscript discussion, acknowledging the aforementioned possibility: “Of note, whether induction of IFN-I production in vivo could also affect CLP and increase the amount of YFP+ lymphoid progenitors and thus B-pDC output is unclear. Further research is required to answer this question.”

      (7) If pDCs are preferentially expanding upon in vivo stimulation, it would be informative to assess their Ki67 profile. This is a surprising observation since pDCs are generally considered quiescent cells that were previously described to die in response to activation and IFN-I (Swiecki et al, Journal of Experimental Medicine, 2011).

      We agree and have entered the following statement to address this concern: “Functionally, they expand more readily after TLR9 engagement than classical pDCs (either through increased proliferation or differentiation of other cell types) and excel at activating T cells in culture.”

      (8) How does the conditional deficiency of BCL11a affect the production of IFN-I and IL-12 in vivo (serum) upon CpG-ODN stimulation?

      Dear reviewer 2, we are currently unable to rederive the conditional knockout mouse strain in a timely fashion. However, our ELISA experiments performed under controlled in vitro activation conditions, along with the in vivo findings of Zhang et al.(PNAS 2017) warrants the hypothesis that B-pDCs most likely exhibit a similar cytokine secreting profile under inflammatory conditions.

      (9) Given that B-PDCs show downregulation of pDC canonical markers, including IRF8 and TLR7, could the authors address how B-PDCs respond to TLR7 stimulation in vitro and assess a broader spectrum of cytokines produced by pDCs in response to such stimulation (IL-6, TNFa, CXCL10...)?

      Dear reviewer 2, although expanding our findings to include B-pDC responses to TLR-7 stimulation would greatly enhance our manuscript, a technical deterrent stands in our way. As mentioned prior, sorting B-pDCs for new experiments using reporter YFP mice is currently not possible, as we have retired this mouse strain. Sorting of live CD79a+ BpDCs via FACS is also not feasible, as CD79a staining with most antibody clones requires permeabilization of cells for easier access to the intra-membrane portion of CD79a.

      (10) It would be informative to compare scRNA sequencing data between control and Bcl11a CKO mice to ascertain their contribution to B-PDCs and whether this deficiency may affect other pDC clusters and/or progenitors.

      We are unable to sort B-pDCs for new experiments, as we unfortunately retired the transgenic colony.

      (11) Transitional DCs were reported to give rise to a subset of cDC2. Given that the authors claim that B-PDCs are related to this subset of transitional DCs, could the authors observe any YFP staining in cDC2 upon the generation of their BM chimeras?

      We saw no YFP positivity in CD11c hi cells (cDCs) via flow or through scRNA-seq, indicating CD79a expression is unique in mature B cells and B-pDCs.

      (12) Most of the statistical analysis is done with a student test. This requires a normal distribution of the sample which is highly unlikely given the size of the sample. Therefore, the authors shall rather use a non-parametric test (Mann Whitney) to compare their samples.

      We agree and have redone our statistical analyses using non-parametric test (Mann Whitney).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1)  In the subsets of the γδ T cells that exhibit reduced BLK expression in B6. SAP KO mice, have the authors examined the expression of Lck and/or Fyn? 

      The reviewer raises an excellent point. We have included in the revised manuscript additional data on Lck and Fyn expression in our scRNAseq dataset in (new Suppl. Fig. 1 and new Suppl. Fig. 4). These data revealed that in contrast to Blk, which appears primarily restricted to the γδT17 clusters, Lck and Fyn exhibit a much broader distribution and lack restriction to specific clusters. We did note that, like Blk, Lck and Fyn transcripts were abundant in SAP-dependent C2 cluster cells. Pseudobulk analysis on the immature clusters revealed that, neither Fyn nor Lck expression level differences reached our cut-off of 0.5 log2 FC (log2 FC Blk = 1.06), leading us to conclude that Blk is particularly dependent on SAP. We did note, however, that the magnitude of Lck differential expression was close to the 0.5 log2 FC cut-off and that its expression was increased in B6.SAP-/- γδ T cells (Suppl. Fig. 4). These results have been added to lines 202-212 in the Results section and lines 491-499 in the Discussion section.

      (2)  Does BLK directly associate with SLAM F1 and or SLAM F6 receptors? 

      The reviewer raises an interesting question given previous reports that BLK, LCK, and FYN have all been implicated in γδ T cell development. While SAP has a well-known ability to recruit FYN to SLAMF1 and there is evidence of a similar SAP-mediated recruitment of LCK to SLAMF6, we are not aware of any evidence a SAP-BLK interaction or of a direct binding of BLK to SLAM family receptors. Future experiments to investigate this possiibility are certainly warranted. In the revised ms, we have included additional discussion of these possibilities (lines 491- 499).  

      (3)  Given the emerging role of γδ T cells in host immunity, it would be useful if the authors could add a discussion of how their findings are relevant in disease conditions such as cancer. 

      We agree and have included new text in the Introduction (lines 37-45). 

      (4)  Delete repeated words in lines 546 and line 553. 

      Thank you—this has been corrected in the revised manuscript.

      Reviewer #2:

      This is a very complete study and requires no additional experimentation. One thing to keep in mind in assessing the ultimate fate of the "ab wannabe cells" is that mechanisms exist to silence the gd TCR as cells differentiate to the DP stage and so their presence as diverted DP cells may not be evident by staining for gdTCR expression - and will only be evident transcriptomically. 

      We appreciate this helpful comment from the reviewer which we will take into consideration in our future experimental design.

      There are a couple of minor points to raise: 

      (1)  Figure 3C is not called out in the text. 

      Thank you—this has been corrected in the revised manuscript.

      (2)  Line 546 - "dependent" is repeated.

      Thank you—this has been corrected in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This article by Navratna et al. reports the first structure of human HGSNAT in an acetyl-CoA-bound state. Through careful structural analysis, the authors propose potential reasons why certain human mutations lead to lysosomal storage disorders and outline a catalytic mechanism. The structural data are of good quality, and the manuscript is clearly written. This study represents an important step toward understanding the mechanism of HGSNAT and is valuable to the field. I have the following suggestions:

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function.

      We have addressed these concerns in the revised version and mentioned these efforts in our previous response letter. We’re briefly mentioning them here again. We attempted measuring HGSNAT catalyzed reaction by monitoring the decrease in acetyl-CoA in the presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA (gray) upon the addition of HGSNAT (red) (Rebuttal figure 1).

      Author response image 1.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 M acetyl-CoA was measured in presence of 10 M D-glucosamine and 30 nM HGSNAT at pH 7.5.

      While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active. In addition, we have shown by cryo-EM that GFP-tagged HGSNAT that we purified in detergent was already bound to the endogenous substrate ACO, an observation that has been observed by Xu et al., as well. Finally, we performed LC-MS on GFP-tagged HGSNAT purified in detergent to detect bound ACO, which could be further removed by dialysis. These results have been included in Figure S9. The endogenous binding of ACO to HGSNAT in detergent suggests that neither the tag nor detergent are detrimental to the function.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer?

      We have already changed this figure in our latest submission. Perhaps the changes made were not obvious while reviewing. We agreed with this reviewer that the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. In the absence of data supporting large movements during the acetyl transfer reaction, old Figure 5 appeared speculative. Hence, we have edited Figure 5 in the revised version of the manuscript based on the observations we made in this study, and different states shown in the figure do not show any conformational changes and only depict acetyl transfer.

      Reviewer #2 (Public Review):

      Summary:

      This work describes the structure of Heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT), a lysosomal membrane protein that catalyzes the acetylation reaction of the terminal alpha-D-glucosamine group required for degradation of heparan sulfate (HS). HS degradation takes place during the degradation of the extracellular matrix, a process required for restructuring tissue architecture, regulation of cellular function and differentiation. During this process, HS is degraded into monosaccharides and free sulfate in lysosomes.

      HGSNAT catalyzes the transfer of the acetyl group from acetyl-CoA to the terminal non-reducing amino group of alpha-D-glucosamine. The molecular mechanism by which this process occur has not been described so far. One of the main reasons to study the mechanism of HGSNAT is that multiple mutations spanning the entire sequence of the protein, such as, nonsense mutations, splice-site variants, and missense mutations lead to dysfunction that causes abnormal accumulation of HS within the lysosomes. This accumulation is a cause of mucopolysaccharidosis IIIC (MPS IIIC), an autosomal recessive neurodegenerative lysosomal storage disorder, for which there are no approved drugs or treatment strategies.

      This paper provides a 3.26A structure of HGSNAT, determined by single-particle cryo-EM. The structure reveals that HGSNAT is a dimer in detergent micelles, and a density assigned to acetyl-CoA. The authors speculate about the molecular mechanism of the acetylation reaction, map the mutations known to cause MPS IIIC on the structure and speculate about the nature of the HGSNAT disfunction caused by such mutations.

      Strengths:

      The paper describes a structure of HGSNAT a member of the transmembrane acyl transferase (TmAT) superfamily. The high-resolution of a HGSNAT bound to acetyl-CoA is important for our understanding of HGSNAT mechanism. The density map is of high-quality, except for the luminal domain. The location of the acetyl-CoA allows speculation about the mechanistic role of multiple residues surrounding this molecule. The authors thoroughly describe the architecture of HGSNAT and map the mutations leading to MPS IIIC.

      Reviewer #3 (Public Review):

      Summary:

      Navratna et al. have solved the first structure of a transmembrane N-acetyltransferase (TNAT), resolving the architecture of human heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT) in the acetyl-CoA bound state using single particle cryo-electron microscopy (cryoEM). They show that the protein is a dimer, and define the architecture of the alpha- and beta-GSNAT fragments, as well as convincingly characterizing the binding site of acetyl-CoA.

      Strengths:

      This is the first structure of any member of the transmembrane acyl transferase superfamily, and as such it provides important insights into the architecture and acetyl-CoA binding site of this class of enzymes.

      The structural data is of a high quality, with an isotropic cryoEM density map at 3.3Å facilitating building of a high-confidence atomic model. Importantly, the density for the acetyl-CoA ligand is particularly well-defined, as are the contacting residues within the transmembrane domain.

      The structure of HSGNAT presented here will undoubtedly lay the groundwork for future structural and functional characterization of the reaction cycle of this class of enzymes.

      Weaknesses:

      While the structural data for the state presented in this work is very convincing, and clearly defines the binding site of acetyl-CoA, to get a complete picture of the enzymatic mechanism of this family, additional structures of other states will be required.

      A weakness of the study is the lack of functional validation. The enzymatic activity of the enzyme characterized was not measured, and the enzyme lacks native proteolytic processing, so it is a little unclear whether the structure represents an active enzyme.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      In the response to reviewers, the authors mention revised coordinates, but the revised coordinates provided to this reviewer do not reflect the stated changes (I assume a technical error somewhere)

      Perhaps, the old coordinates in the deposition system were resubmitted with the revised draft. Nevertheless, we have made the changes suggested by this reviewer to structure in the previous round and have released the new coordinates (PDB ID: 8TU9).

      Is there any evidence for the interprotomer disulfide except for the map? e.g. if it is a disulfide-linked dimer, one should see a shift in mobility on non-reducing vs reducing SDS-PAGE. Without this, the evidence from the map is not conclusive - while the symmetry-related cysteines are nearby to one another, based on the map I could argue that they could just as well be modeled with the cys sidechains reduced and pointing away from one another.

      In addition to building the density based on cryo-EM maps, we have performed FSEC-based thermal melt analysis of the Ala mutation of C334 that is involved in disulfide at the dimer interface. C334A is still expressed as a dimer, suggesting that C334A is not the only residue stabilizing the dimer. Upon heating the detergent-solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Figure 4-Figure supplement 1 in main manuscript). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer. We have also performed PAGE analysis as suggested by this reviewer and noticed that reducing conditions result in a monomeric protein band (Rebuttal figure 2). While we were revising this manuscript, two other groups published structures of HGSNAT (Xu et al., 2024, Nat. Struct Mol Biol, and Zhao et al., 2024, Nat. Comm). These groups have also identified this disulfide at the dimer interface in their HGSNAT structures. Zhao et al. showed that this disulfide is not crucial for dimerization and also suggested that it can break depending on the conformation of HGSNAT. Our FSEC results agree with this observation.

      Author response image 2.

      Comparison of purified HGSNAT on native and reducing SDS-PAGE. The arrows on both the gels indicate N-GFP-HGSNAT. The two bands on the SDS PAGE are, perhaps, two differentially glycosylated forms of HGSNAT.


      The following is the authors’ response to the original reviews.

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function. The authors would need to establish an in vitro assay using purified protein and assess the level of Acetyl-CoA in the reaction (there are commercial kits and a long list of literature showing how to measure this). They could also follow the HS acetylation reaction by e.g. HPLC-MS or NMR (among other methods).

      The cryo-EM sample was prepared without the exogenous addition of ligand, as noted in the manuscript. However, we see that acetyl-CoA was intrinsically bound to the protein, indicating the ability of GFP-tagged HGSNAT protein to bind the ligand. Upon dialysis, we see release of acetyl-CoA from the protein, which we have confirmed by LC-MS analysis (Fig S9). We purified the protein at a pH optimal for acetyl-CoA binding, as suggested by Bame, K. J. and Rome, L. H. (1985) and Meikle, P. J. et al., (1995). Because we see acetyl-CoA in a structure obtained using a GFP fusion, we argue that GFP does not interfere with protein stability and ability to bind to the co-substrate. As demonstrated by existing literature HGSNAT catalyzed reaction is compartmentalized spatially and conditionally. The binding of acetyl-CoA happens towards the cytosol and is optimal at pH 7-0.8.0, while the transfer of the acetyl group to heparan sulfate occurs towards the luminal side and is optimal at pH 5.0-6.0. We attempted measuring HGSNAT catalyzed reaction by monitoring decrease in acetyl-CoA in presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA in the presence of HGSNAT-ACO complex (blue) and apo HGSNAT (red); the difference compared to the ACO standard (gray) was not significant. While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active.

      Author response image 3.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 mM acetyl-CoA was measured in presence of 10 mM D-glucosamine and 30 nM HGSNAT at pH 7.5.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer? The speculative nature of this assumption needs to be clearly acknowledged throughout the manuscript and discussed in more detail. The authors could use HDX-MS or introduce cysteine residues in the hypothetical inward- and outward-facing cavities and test accessibility by incubating the purified protein with maleimides or other agents reacting with free cysteine.

      We thank the reviewers for this insightful critique. Yes, the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. We also agree with the reviewer that HDX-MS could be the best way to monitor the substrate-induced conformational dynamics within HGSNAT experimentally. In the absence of data supporting large movements during the acetyl transfer reaction, figure 5 is speculative. We have now edited Figure 5 in the revised version of the manuscript based on the observations we made in this study.

      (3) The acetyl-CoA-bound state is described as the open-to-lumen state. Indeed, from Figure 1C, the lumen opening appears much larger than the cytosol opening. Is there any small tunnel that connects the substrate site to the cytosol? In other words, is this state accessible to both the lumen and the cytosol, albeit with a larger opening toward the lumen? This question arises because, in Figure S5, the tunnel calculated by MOLE seems to also connect to the cytosol.

      Yes, it is likely that the ACOS is accessible via lumen and cytosol to varying degrees, as evidenced by MOLE prediction. However, binding of the bulky nucleoside head group of acetyl-CoA at ACOS blocks the cytosolic entrance in the confirmation discussed in this manuscript. MOLE prediction was performed on a structure devoid of acetyl-CoA, and it is possible that the protein doesn’t essentially undergo isomerization between open-to-lumen and open-to-cytosol confirmations during acetyl transfer. Likely, ACOS is always accessible from both the lumen and cytosol, but depending on the substrates or products bound, the accessibility could be limited to either the lysosomal lumen or cytosol. We have rewritten all the statements mentioning an open-to-lumen confirmation to reflect this argument.

      (4) The authors state, "Interestingly, in most of the detergent conditions we tested, HGSNAT was predominantly dimeric (Fig S1C-H)," and also mention, "In all the detergents we tested, HGSNAT eluted as a dimer, a testament to the extensive side-chain interaction network." The dimerization is said to be mediated by a disulfide bond. I would be surprised if the detergents the authors tested could break a disulfide bond. Therefore, can this observation truly serve as a testament to an "extensive" side-chain interaction network?

      We agree with the reviewer that detergents are unlikely to break a disulfide bond. To address this comment, we generated a C334A mutant of HGSNAT and extracted it from cells in 1% digitonin. It is still expressed as a dimer (Fig S8E). However, upon heating the detergent solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Fig S8I and S8K). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer.

      (5) Apart from the cryo-EM structure, the article does not provide any other experimental evidence to support or explain a molecular mechanism. Due to the complete absence of functional assays, mutagenesis analysis, or other structures such as a ternary complex or an acetylated enzyme intermediate, the mechanistic model depicted in Figure 5 should be taken with caution. This uncertainty needs to be clearly described in the manuscript text. Performing additional mutagenesis experiments to test key hypotheses, or further discussing relevant data from the literature, would strengthen the manuscript.

      We agree with the reviewer on the lack of supporting evidence for the mechanistic models proposed in Fig 5. They were made based on previously reported biochemical characterization of HGSNAT by Rome & Crain (1981), Rome et al. (1983), Miekle et al. (1995), and Fan et al. (2011). However, we agree with the reviewer that this schematic is not experimentally proven and is speculative at best. We have edited Figure 5 in the revised version of the manuscript. In addition, we have also performed mutagenesis analysis to study the stability of mutants (Fig S8) and performed LC-MS analysis to identify endogenously bound acetyl-CoA (Fig S9) to strengthen parts of the manuscript. We have discussed our findings in the results and modified the discussion according to these suggestions.

      (6) It is discussed that H269 is an essential residue that participates in the acetylation reaction, possibly becoming acetylated during the process. However, there is no solid experimental evidence, e.g. mutagenesis analysis or structural analysis, in this or previous articles, that demonstrates this to be the case. Providing more information, ideally involving additional experimental work, would strengthen this aspect of the mechanism that is proposed. This would require establishing an in vitro assay, as described in 1).

      H269, as a crucial catalytic residue, was suggested by monitoring the effect of chemical modifications of amino acids on acetylation of HGSNAT membranes by Bame, K. J. and Rome, L. H. (1986). We generated N258I and H269A mutants of HGSNAT and analyzed their stability. We noticed a greater destabilization in N258I compared to H269A (Fig S8). We believe this is because of the loss of ability to bind acetyl-CoA, as the TMs around a catalytic core of the protein in our cryo-EM structure were stabilized by interactions with acetyl-CoA. Recently, Xu et al. (2024, Nat Struct Mol Biol) suggested that they do not observe acetylated histidine in their structure. However, our structure and that reported by Xu et al. (2024) are obtained at cytosolic pH. Perhaps, acetylation of H269 occurs at acidic lysosomal pH. Extensive structural and catalytic investigation of HGSNAT at low pH is required to rule out H269 acetylation as a step in the HGSNAT catalyzed reaction.

      (7) In the discussion part, the authors mention previous studies in which it was postulated that the catalytic reaction can be described by a random order mechanistic model or a Ping Pong Bi Bi model. However, the authors leave open the question of which of these mechanisms best describes the acetylation reaction. The structure presented here does not provide evidence that could support one mechanism or the other. The authors could explore if an in vitro experimental measurement of protein activity would provide any information in this regard.

      We agree with the reviewer that a more detailed kinetic analysis is necessary to define the bisubstrate reaction mechanism of HGSNAT. All the existing structural data on two isoforms of HGSNAT is obtained at basic pH. As a result, the existing structures do not unambiguously demonstrate the bisusbtrate mechanism of HGSNAT. We believe low pH structural characterization and a detailed kinetic and structural characterization of HGSNAT in membrane mimetics like nanodiscs could provide more insights into the mechanism. However, these studies are a future undertaking and are not a part of this manuscript.

      (8) Although the authors map the mutations leading to MPS IIIC on the structure and use FoldX software to predict the impact of these mutations on folding and fold stability, there is no experimental evidence to support FoldX's predictions. It would be ideal if an additional test for these predictions were included in the manuscript. The authors could follow the unfolding of purified mutants by SEC, FSEC, or changes in intrinsic fluorescence to assess protein stability.

      As suggested here, we prepared HGSNAT MPSIIIC variants and tested their expression and stability (please see Fig S8). These results have been included in the revised version of the manuscript.

      (9) Some sidechains that have quite strong sidechain density are missing atoms. I would be particularly careful with omitting sidechains that pack in the hydrophobic core, as this can tend to artificially reduce the clash score. Check F81, L62, P91 and V87, for example.

      We have revisited the modeling of these regions and deposited new coordinates.

      (10) W316 seems to have the wrong rotamer.

      This has been corrected in the new coordinate file that has been released.

      (11) N134 and N433 seem to have extra density. Are these known glycosylation sites?

      As per Hrebicek M. et al., 2006 and Feldhammer M. et al., 2009, there are five predicted glycosylation sites: N66, N114, N134, N433, and N602. However, we see evidence for NAG density at N114, N134, and N433. These have now been modeled in the structure.

      (12) At the C-terminal residue (Ile-635), the very C-terminal carboxylate is modeled pointing to a hydrophobic environment. It seems more likely to me that the Ile sidechain is packing here, with the C-terminal carboxylate facing the solvent.

      Thank you for pointing this out. We have edited the orientation of the Ile sidechain accordingly.

      Presentation and wording of results/methods:

      - Figure S3 legend "At places with missing density, the side chains were trimmed to C- alpha" - this is incorrect, I think the authors mean C-beta.

      We have corrected this error in the revised version of the manuscript.

      - Figure S3 legend - the authors refer to a gray mesh, where a transparent surface is displayed.

      Thanks for pointing this error out. We have corrected this in the revised version.

      - Some colloquial/vague wording in the main text (a lot of sentences starting with "Interestingly, ...". Making the wording more specific would help the reader I think.

      We have edited out ‘interestingly’ from the document and have re-written parts of the manuscript, per reviewers’ suggestion, for brevity.

      - Figure S2 legend, "throughout the processing workflow the resolution of luminal domain was used as a guidepost" - it is not entirely clear to me what this means in this context, perhaps revise the wording?

      We have rephrased this line in the revised draft of the manuscript.

      - Figure S2 and methods, Local refinements of LD and TMD are mentioned, but not indicated on the processing workflow.

      We have included a new Fig S2 & edited the legend, including these changes, per the reviewers’ suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to reviewers (minor points):

      We thank all reviewers for their very helpful suggestions and greatly appreciate their positive evaluation of our work.

      Reviewer #1:

      Ad 1) The reviewer states: Fig 5 While the data very nicely show that CPX and Syt1 have interdependent interactions in the chromaffin neurons, this seems to be not the case in neurons, where the loss of complexins and synaptotagmins have additive effects, suggesting independent mechanisms (eg Xue et al., 2010). This would be a good opportunity to discuss some possible differences between secretion in endocrine cells vs neurons.

      We greatly appreciate the insightful suggestion by the reviewer. To accommodate the reviewer’s suggestion, we now discuss this issue on page 21, line 486-491: “In murine hippocampal neurons, loss of CpxI and Syt1 has additive effects on fast synchronous release, suggesting independent mechanisms (Xue et al., 2010). On the other hand, the same study also showed that Syt1 heterozygosity fails to reduce release probability in wild-type neurons, but does so in the absence of Cpx, again suggesting that Cpx and Syt1 may functionally interact in Ca2+-triggered release.”

      Ad 2) The reviewer states: Fig 8 Shows an apparent shift in Ca sensitivity in N-terminal mutants suggesting a modification of Ca sensitivity of Syt1. Could there be also an alternative mechanism, that explains this phenotype which is based on a role of the n-term lowering the energy barrier for fusion, that in turn shifts corresponding fusion rates to take place at lower Ca saturation levels?

      We fully agree with the reviewer. While our data indicate that Cpx and Syt1 act in a dependent manner in accelerating exocytosis, they do not provide decisive evidence that the NTD of CpxII directly modulates the Ca2+ affinity of Syt1, an issue that we discuss on page 23 , line 523529: ”The results favor a model wherein the CpxII NTD either directly regulates the biophysical properties of the Ca2+-sensor by increasing the apparent forward rate of Ca2+-binding or indirectly affects SytI-SNARE or SytI-membrane interactions, thereby, lowering the energy barrier of Ca2+triggered fusion.”

      Reviewer #2:

      Ad 1) The reviewer states: The authors provide a "chromaffin cell-centric" view of the function of mammalian Cplx in vesicle fusion. With the exception of mammalian renal ribbon synapses (and some earlier RNAi knockdown studies that had off-target effects), there is very little evidence for a "fusion-clamp"-like function of Cplxs in mammalian synapses. At conventional mammalian synapses, genetic loss of Cplx (i.e. KO) consistently decreases AP-evoked release, and generally either also decreases spontaneous release rates or does not affect spontaneous release, which is inconsistent with a "fusion-clamp" theory. This is in stark contrast to invertebrate (D. m. and C. e.) synapses where genetic Cplx loss is generally associated with strong upregulation of spontaneous release, providing support for Cplx acting as a "fusion-clamp".

      We agree with the reviewer that it is difficult to reconcile contradictory findings regarding the role of Cpx in membrane fusion in vertebrates and invertebrates or between murine hippocampal neurons and neuroendocrine cells. On the other hand, we respectfully disagree with the statement of providing a "chromaffin cell-centric" view of the function of mammalian Cplx in vesicle fusion. In fact, a large number of model systems (in vitro and in vivo studies) support a scenario where complexin takes center stage in clamping of premature vesicle release. For example, in vitro analyses using a liposome fusion assay (Schaub et al., 2006, Nat Struct Mol Biol 13, 748; Schupp et al., 2016) or Hela cells that ectopically express “flipped” SNAREs on their cell surface (Giraudo et al., 2008, JBC 283, 21211) showed that complexin can inhibit the SNARE-driven fusion machinery. Likewise, several studies boosting complexin action by either genetic overexpression or peptide supplementation have provided evidence for the complexin clamp function in neuronal and nonneuronal cells (e.g. Itakura et al., 1999, BBRC 265, 691; Liu et al., 2007, Biochemistry 72, 439; Abderrahmani et al., 2004, J Cell Sci 117, 2239; Archer et al., 2002, JBC 277, 18249; Tang et al, 2006,

      Cell 126, 1175; Vaithianathan et al., 2013, J Neurosci 33, 8216; Roggero et al., 2007, JBC, 282, 26335.)

      In addition, chromaffin cells enable the investigation of secretion on the background of a well-defined intracellular calcium concentration. Indeed, CplxII knock-out in chromaffin cells demonstrated an enhanced tonic release which is evident at elevated levels of [Ca]i (>100nM), but absent at low resting [Ca]i (Dhara et al., 2014). Given this observation, it is tempting to speculate that variations in [Ca]i among the different preparations may contribute to the deviating expression of the complexin null phenotype in different preparations.

      Ad 2) The reviewer states: The authors use a Semliki Forest virus-based approach to express mutant proteins in chromaffin cells. This strategy leads to a strong protein overexpression (~7-8 fold, Figure 3 Suppl. 1). Therefore, experimental findings under these conditions may not necessarily be identical to findings with normal protein expression levels.

      As shown in Fig. 4, we use the secretion response of wt cells as a control so that we can assess the specificity and quality of the rescue approach in our experiments. In addition, the comparative analysis of the CpxII mutants was performed with respect to the equally overexpressed CpxII wt protein (Fig. 3 Suppl. 1), which we used as a control to determine the standard response under these conditions.

      Ad 3) The reviewer states: Measurements of delta Cm in response to Ca2+ uncaging by ramping [Ca2+ ] from resting levels up to several µM over a me period of several seconds were used to establish changes in the release rate vs [Ca2+ ]i relationship. It is not clear to this reviewer if and how concurrently occurring vesicle endocytosis together with a possibly Ca2+-dependent kinetics of endocytosis may affect these measurements.

      By infusing bovine chromaffin cells with 50µM free Ca2+, Smith and Betz have shown that the total capacitance increase is dominated by exocytosis and that significant endocytosis only sets in after 3 minutes (Smith and Betz, 1996, Nature, 380, 531). In the same line, we previously showed that mouse chromaffin cells (infused with 19µM free calcium over 2 minutes) responded with robust increase in membrane capacitance which strongly correlated with the number of simultaneously recorded amperometric events monitoring fusion of single vesicles (Dhara et al., 2014, Fig. 5B). Thus, capacitance alterations recorded under tonic intracellular Ca2+ increase in chromaffin cells are solely due to exocytosis and are not contaminated by significant endocytosis. As our Ca2+ ramp experiments were carried out for 6 seconds and the intracellular free [Ca]i did not exceed 19 µM the observed phenotypical differences between the experimental groups are most likely due to changes in exocytosis rather than endocytosis.

      Ad 4) The reviewer states: It should be pointed out that an altered "apparent Ca2+ affinity" or "apparent Ca2+ binding rate" does not necessarily reflect changes at Ca2+-binding sites (e.g. Syt1).

      We fully agree with the reviewer’s comment. As pointed out also in the response to reviewer 1, our experiments do not provide decisive evidence that the NTD of CpxII directly modulates the Ca2+ affinity of Syt1, an issue that we discuss on page 23 , line 523-529: ” The results favor a model wherein the CpxII NTD either directly regulates the biophysical properties of the Ca2+sensor by increasing the apparent forward rate of Ca2+-binding or indirectly affects SytI-SNARE or SytI-membrane interactions, thereby, lowering the energy barrier of Ca2+-triggered fusion.” 

      AD 5) There are alternative models on how Cplx may "clamp" vesicle fusion (see Bera et al. 2022, eLife) or how Cplx may achieve its regulation of transmitter release without mechanistically "clamping" fusion (Neher 2010, Neuron). Since the data presented here cannot rule out such alternative models (in this reviewer's opinion), the authors may want to mention and briefly discuss such alternative models.

      The study by Bara et al reiterates the model proposed by the Rothman group which attributes the clamping function of Cpx to its accessory alpha helix by hindering the progressive SNARE complex assembly. We have explicitly stated this issue in the original version of the manuscript (page 19, line 425) “As the accessory helix of Cpx has been found to bind to membrane proximal cytoplasmic regions of SNAP-25 and SybII (Malsam et al., 2012; Bykhovskaia et al., 2013; Vasin et al., 2016), an attractive scenario could be that both domains of CpxII, the CTD and the accessory helix, synergistically cooperate to stall final SNARE assembly”. In this context, we will now cite also the study by Bera et al.. 

      A related view of the function of complexin suggested that it may act as an allosteric adaptor for sytI (Neher 2010, Neuron). Here, rather than postulang independent "clamp" and "trigger" functions for the dual action of complexin, these were explained as facets of a simple allosteric mechanism by which complexin modulates the Ca2+ dependence of release. Yet, this interpretation appears to be difficult to reconcile with the observation of our and other laboratories, showing that the fusion-promoting and clamping effects are separable (e.g. Dhara et al., 2014; Lai et al., 2014; Makke et al., 2018; Bera et al., 2022).  

      Some parts of the Discussion are quite general and not specifically related to the results of the present study. The authors may want to consider shortening those parts.

      Considering the contrary findings in the field of SNARE-regulating proteins, the authors hope that the reviewer will agree that it is necessary to discuss the new observations in a broader context, as also acknowledged by the first reviewer.

      Last but not least, the presentation of the results could be improved to make the data more accessible to non-specialists, this concerns providing necessary background information, choice of colors, and labeling of diagrams.

      Done

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Regarding figures: 

      (1) Please use clearly distinct colors in diagrams. For example, in Figure 2 Suppl. 3, four different shades of red (or reddish) are used to color the traces and the respective bars. These different shades of red are difficult to discriminate. In Figure 5 Suppl. 1, the two greens are nearly indistinguishable.  

      Done

      (2) RRP size and SRP size on the one hand, and SR rate on the other represent different quantities which are measured in different units. Please use a separate y-axis for the SR (a rate measured in fF/s) and do not combine with RRP and SRP (pool sizes measured in fF). This would also automatically alleviate the need for axis breaks in the plots of RRP size and SRP size. In general, please do not use axis breaks which make interpretation of data unnecessarily more complicated.  

      In order to clarify the display, we now define the different units together with the quantified parameter (e.g. RRP [fF], SRP [fF], SR [fF/s]) allowing us to omit a second axis in those subpanels.

      (3) When plotting bar graphs showing mean tau_RRP, mean tau_SRP, and mean delay, please always use the correct y-axis labels, i.e. use "tau_RRP", "tau_SRP" and "delay" as y-axis labels as it was done for example in Figure 4D, and do not use "tau_RRP", "tau_SRP" and "delay" as x-axis labels as it was done for example in Figure 1D and many other figure panels.  

      We have standardized the figure display. Yet, we would prefer to keep our way subpanel labelling which states the parameter underneath the bar graph and thereby makes the results more accessible.  

      (4) Are the asterisks indicating statistical significance perhaps missing in Figure 4D, middle panel (tau_SRP)?

      There was not a statistically significant difference (wt vs cpxIIko+CpxII EA, P=0.0826, Kruskal-Wallis with Dunn’ post hoc test).  

      (5) According to the Results section (pages 12 to 13), I assume that in Figures 6 and 7 the labels "+Cplx XYZ" are used by the authors to identify an overexpression of Cplx XYZ in a Cplx WT background. The legend text reads however " ... cells expressing either Cplx2 wt or the mutant ...", which would not be correct. Please check.

      We have changed the formulations to “overexpression” accordingly.

      (6) The x-axis unit in Figure 8C is likely "µM" and not "M".

      Done.

      (7) The abbreviations "CplxII LL-EE" and "CplxII LL-WW", and "CplxII LLEE" and "CplxII LLWW" are very similar but refer to different mutants. Could you please think of a more specific and unambiguous abbreviation? Perhaps "CplxII L124E-L128E"?  

      We have changed the abbreviations, accordingly (i.e. CpxII L124E-L128E).  

      Regarding the manuscript text:  

      Line 65: "prevents" instead of "impairs"? 

      done

      Line 67: why "in vivo"? 

      We changed the formulation to ‘Several’

      Line 83: "in addition to the clamping function ..." This is misleading. Many of the studies listed here did not provide evidence for enhanced spontaneous release following Cplx loss and often observed the opposite, reduced spontaneous release. The enhanced delayed release was observed by Strenzke et al 2009 J.Neurosci. and by Chang et al. 2015 J.Neurosci. (which the authors may want to cite). However, that enhanced delayed release occurred despite reduced spontaneous release indicating that it is not simply the result of a missing "fusion clamp". 

      To accommodate the reviewer’s suggestion, we have changed the formulation to “Independent of the clamping function of Cpx….”

      Line 104: "speeds up exocytosis that is controlled by the forward rate of Ca2+ binding" This is difficult to understand without context.  

      We have now added the corresponding citations (Voets et al., 2001; Sorensen et al., 2003), which showed that exocytosis timing in chromaffin cells is largely determined by the kinetics of Ca2+-binding to SytI.

      Line 116: "Cplx2 knock out ..." Please provide (here or earlier in the manuscript) information to the reader about which Cplx paralogs are expressed in chromaffin cells.  

      We now state on line 111 that “CpxII is the only Cpx isoform expressed in chromaffin cells (Cai et al., 2008)”

      Line 118: "=~" either "=" or "~". 

      done

      Line 120: "instead" seems superfluous.

      done

      Line 272: "calcium binding rates" should perhaps better read "apparent calcium binding rates". 

      done

      Line 290: "enhancing SytI's Ca2+ affinity" should perhaps better be "enhancing the apparent Ca2+ affinity of the release machinery". Ca2+ binding kinetics is never directly assayed here.

      We agree and have phrased the sentence accordingly.

      Line 300: "Expression of Cplx ... in Syt1 R233Q ki cells, ..." Perhaps better "Overexpression of Cplx ... in Syt1 R233Q ki/Cplx2 wt cells, ..." for clarification?

      done

      Lines 313ff: What is assayed here is the apparent Ca2+ binding kinetics and apparent KD values of the release machinery. Ca2+ binding to Syt1 is never directly measured!  

      We agree and have changed the wording accordingly to “CpxII NTD supports the forward rate of calcium binding to SytI in accelerating exocytosis”

      Line 347: "Complexin plays a dual role ..." This is partially misleading. It does so in chromaffin cells and D.m. and C.e. NMJs but not at conventional mammalian synapses. 

      We agree and have changed the formulation to “In many secretory systems, Complexin plays a dual role in the regulation of SNARE-mediated vesicle fusion”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Introduction to the revised manuscript:

      We thank all three reviewers for their time and insightful comments on our original submission. We are submitting a substantially revised manuscript that includes several new experiments, analyses, discussion points, and clarifications that we believe address all of the main concerns of the reviewers.

      To address the request of Reviewers 2 and 3 to reinforce key findings in a more physiologically intact preparation, we performed recordings of YH-HET SST neurons in brain slices and found that these neurons show impairments in AP generation similar to those observed in YH-HET SST cultured neurons. These data are summarized in a new figure (Fig. 9). Along these lines, we performed additional recordings in cultured neurons at room temperature compared with physiological temperature and found that WT and YH-HET PV neuronal properties were similarly altered by temperature increases, suggesting that our YH variant-induced neuronal phenotypes are not temperature dependent. These data are shown in a new supplemental figure (Supplemental Fig. 4-3). To address concerns of Reviewer 1 regarding our KNa and NaP current recordings, we performed new experiments to further assess the specificity of the VU170 blocker in KNa KO neurons (summarized in Supplemental Fig. 5-2) and to better characterize the time course over which TTX blocks the persistent Na+ current and the KNa current (summarized in Supplemental Fig. 7-1). These latter two experiments provide further clarity and confidence in the accuracy of our measurements of both KNa and NaP currents. Lastly, to address the concern of Reviewer 3 regarding statistical analyses of the modeling data, we’ve added a new table with the results of a repeated measures ANOVA analysis (Supplemental Table 6), and two new figures illustrating the relative changes in each neuron group compared to their controls (Supplemental Figures 6-2 and 7-2). 

      In addition to the new experiments and analyses, we’ve added three new paragraphs to the Discussion section. As the hyperexcitability phenotype in YH-HET PV neurons is somewhat unexpected, we’ve added a paragraph comparing our findings with those found in PV neurons in another KCNT1 GOF model. We’ve also added a paragraph to speculate on the contribution of YH-HET variant-induced alterations in SST and PV neurons to network behavior and seizure propensity. Lastly, we’ve added a paragraph to include the additional limitations and caveats of our study requested by the reviewers.  

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the effects of a heterozygous mutation in the KCNT1 potassium channels on the properties of ion currents and the firing behavior of excitatory and inhibitory neurons in the cortex of mice expressing KCNT1-Y777H. In humans, this mutation as well as multiple other heterozygotic mutations produce very severe early-onset seizures and produce a major disruption of all intellectual function. In contrast, in mice, this heterozygous mutation appears to have no behavioral phenotype or any increased propensity to seizures.

      Regarding the last sentence above, we wanted to clarify a point that we neglected to emphasize in the initial submission. In the Results section from our previous paper (Shore et al., 2020), we failed to observe seizures in 14 heterozygous mice, whereas 23/25 homozygous mice showed seizures by video-EEG. However, in the fifth paragraph of the Discussion section from that paper, we further stated that “during the preparation and review of [that] article, we observed seizures in two Kcnt1-Y777H heterozygous mice, one during a widefield Ca2+ imaging experiment and the other during a video-EEG experiment”. Thus, we concluded that “heterozygous expression can result in seizures in a rodent model, but apparently at a much lower frequency than that observed with homozygous expression”. To emphasize these findings, we’ve added a sentence to the Introduction in this manuscript about the occurrence of infrequent seizures in Kcnt1-Y777H heterozygous mice, along with a reference to the Discussion of our previous paper.

      A relevant phenotype is, however, evident in mice with the homozygous mutation, and the authors have previously published the results of similar experiments with the homozygotes. As perhaps expected, the neuronal effects of the heterozygous mutation presented in this manuscript are generally similar but markedly smaller than the previously published findings on homozygotes. There are, however, some interesting differences, particularly on PV+ interneurons, which appear to be more excitable than wild type in the heterozygotes but more excitable in the heterozygotes. This raises the interesting question (which could be more explicitly discussed by the authors) as to whether the reported changes represent homeostatic events that suppress the seizure phenotype in the mouse heterozygotes or simply changes in excitability that do not reach the threshold for behavioral outcomes.

      That is an interesting question. We have added a new paragraph to the Discussion speculating about whether the alterations in SST and PV excitability suppress seizures or do not reach the threshold for behavioral outcomes. This seems to be requested by the second reviewer as well in Weaknesses point #2.

      Strengths and Weaknesses:

      (1) The authors find that the heterozygous mutation in PV+ interneurons increases their excitability, a result that is opposite from their previous observation in neurons with the corresponding homozygous mutation.

      We would like to provide a minor clarification to the above statement that, in this manuscript, we show that “the heterozygous mutation in PV+ interneurons increases their excitability, a result that is opposite from their previous observation in neurons with the corresponding homozygous mutation”. In our previous manuscript, we assessed YH-HOM phenotypes in NFS and FS GABAergic neurons, but did not specifically mark PV neurons. Although the YH-HOM FS neurons showed an increase in rheobase and a decrease in AP firing, the magnitudes of these effects were far less than those observed in the NFS population. More importantly, the FS GABAergic population likely consists of PV- and SST-expressing neurons; thus, we can not directly compare the results from the NFS and FS groups to the PV and SST groups, respectively (please see our response to Weaknesses point #3, Reviewer #2). We apologize for the confusion.

      They propose that this results from the selective upregulation of a persistent sodium current INaP in the PV+ interneurons. While the observations are very interesting, there are three issues concerning this interpretation that should be addressed:

      A) The protocol for measuring the INaP current could potentially lead to results that could be (mis)interpreted in different ways in different cells. First, neither K currents nor Ca currents are blocked in these experiments. Instead, TTX is applied to the cells relatively rapidly (within 1 second) and the ramp protocol is applied immediately thereafter. It is stated that, at this time, Na currents and INaP are fully blocked but that any effects on Na-activated K currents are minimal. In theory, this would allow the pre- to post-difference current to represent a relatively uncontaminated INaP. This would, however, only work if activation of KNa currents following Na entry is very slow, taking many seconds. A good deal of literature has suggested that the kinetics of activation of KNa currents by Na influx vary substantially between cell types, such that single action potentials and single excitatory synaptic events rapidly evoke KNa currents in some cell types. This is, of course, much faster than the time of TTX application. Most importantly, the kinetics of KNa activation may be different in different neuronal types, which would lead to errors that could produce different estimates of INaP in PV+ interneurons vs other cell types.

      First, we’d like to point out that we did not want to block K+ currents (which would also block KNa) when measuring INaP for these experiments, because our hypothesis was that the increased KNa current in YH-HET PV neurons was somehow causing an increase in INaP, and it is possible that this increase depends on an intact KNa. Thus, we decided to use a method based on the observation in our experiments, and previously made by others (Budelli et al., 2009), that the reduction of outward current after TTX addition is slow relative to the rapid reduction in Na+ current. We understand and agree with the reviewer that, if KNa currents were blocked more quickly by TTX in some neuron types than others, then our estimate of INaP using this method would be contaminated in these neuron types, which would lead to inaccurate measurements. To assess this possibility among the main neuron types used in this study, we performed new experiments in which we monitored the time course of INaP block and subsequent IKNa loss following TTX application in PV and SST neurons during slow voltage ramps. We note that action potentials are not present in the slow voltage ramps due to inactivation of the transient Na+ current. These new experiments show that, in SST and PV (both WT and Het) neurons, the block of INaP is nearly complete at the 6s time point, whereas the decay in IKNa is far slower (V50 of ≈ 25s), and importantly, these results do not differ substantially by cell type or genotype. These data suggest that our measurements of INaP are not significantly contaminated by IKNa, and that this method allows for the effective separation of these two currents. These data have been added as a supplemental figure (Supplemental Fig. 7-1) and are briefly described and referenced in the Results section.

      B) As the authors recognize, INaP current provides a major source of cytoplasmic sodium ions for the activation. An expected outcome of increased INaP is, therefore, further activation of KNa currents, rather than a compensatory increase in an inward current that counteracts the increase in KNa currents, as is suggested in the discussion.

      We agree that the increase in INaP could theoretically further increase IKNa, as veratridine was previously shown to increase IKNa (Hage & Salkoff, 2012). However, we do not believe that this would necessarily be the case, because as the reviewer notes in their next comment, there is insufficient information on the relative locations of the INaP and KCNT1 channels, as well as the kinetics of sodium transfer to KCNT1 channels, and even less is known in the context of KCNT GOF neurons. Thus, there are a couple of plausible reasons that increased INaP may not alter KNa currents in YH-HET PV neurons: (1) In YH-HET PV neurons, the particular sodium channels that are responsible for the increased INaP may not be located within close proximity to the KCNT1 channels. (2) Homeostatic mechanisms that alter the AIS length, or move the AIS further from the soma, in response to altered neuronal excitability are well described (Grubb & Burrone, 2010; Kuba et al., 2010); thus, it is possible that in YH-HET PV neurons, the length or location of the AIS is altered, leading to uncoupling of the sodium channels that are responsible for the increased INaP to the KCNT1 channels.

      C) Numerical simulations, in general, provide a very useful way to evaluate the significance of experimental findings. Nevertheless, while the in-silico modeling suggests that increases in INaP can increase firing rate in models of PV+ neurons, there is as yet insufficient information on the relative locations of the INaP channels and the kinetics of sodium transfer to KNa channels to evaluate the validity of this specific model.

      We completely agree; thus, we have described each of these limitations in the Discussion. We state that the model neurons may “lack more detailed features of ion channels, such as post-translational modifications and subcellular localizations”, and that our KCNT1 model conductance is “hampered by an incomplete understanding of the relationship between Na+ influx, membrane voltage, and channel gating in neurons”.  

      (2) The greatest effect of TTX application would be expected to be the elimination of large transient inward sodium currents. Why are no such currents visible in the control (pre-TTX) or the difference currents (Fig. 2)? Is it possible I missed something in the methods?

      We apologize for the confusion and our mistake in failing to mention this important feature of the displayed traces. To include all of the representative traces in the figures, and prevent overlap of the traces, we removed the large inward sodium currents using the masking tool in Adobe Illustrator in Figure 2 and Supplemental Figure 5-1. We have added that information to the relevant figure legends. We have also provided unmasked images of the representative traces from Figure 2 and Supplemental Figure 5-1 to illustrate the large transient inward sodium currents, and the significant reduction of these currents with TTX treatment.

      (3) As expected, the changes in many of the measured parameters are smaller in the present study with heterozygotes than those previously reported for the homozygous mutation. Some of the statements on the significance of some of the present findings need to be stated more clearly. For example, in the results section describing Fig. 2, it is stated that "In glutamatergic and NFS GABAergic YH-HET neurons, the overall KNa current was increased ...as measured by a significant effect of genotype ...." Later in the same paragraph it is stated that the increases in KNa current are not significant. Apparently, different tests lead to different conclusions. Both for the purpose of understanding the pathophysiological effects of changes in KNa current and for making further numerical simulations, more explicit clarifying statements should be made.

      We apologize for the confusion on the description of these statistics. The results come from the same test, which is a Generalized Linear Mixed Model (GLMM). The factors in our GLMM were voltage step, genotype, and a voltage step x genotype interaction term. The overall effect of genotype is significant in glutamatergic neurons, but pairwise tests at each voltage step show no significant effect of genotype at any given voltage. This is somewhat analogous to running a traditional ANOVA on multiple groups and finding a significant ANOVA p-value but no significant post-hoc multiple comparisons tests, and is not uncommon. Our interpretation of this is that heterozygous expression of the YH variant in glutamatergic neurons likely increases KNa currents across positive potentials (as was seen with the YH-HOM glutamatergic neurons), but only a small amount at each positive step; thus, we lack the statistical power to determine any particular voltage step where this occurs.

      (4) The effects of the KCNT1 channel blocker VU170 on potassium currents are somewhat larger and different from those of TTX, suggesting that additional sources of sodium may contribute to activating KCNT1, as suggested by the authors. Because VU170 is, however, a novel pharmacological agent, it may be appropriate to make more careful statements on this. While the original published description of this compound reported no effect on a variety of other channels, there are many that were not tested, including Na and cation channels that are known to activate KCNT1, raising the possibility of off-target effects.

      We agree and thank the reviewer for making this point. To address this question, we measured KNa currents in WT vs. Kcnt1/Kcnt2-dKO neurons using VU170 to illustrate the extent of outward current due to off-target effects of the drug. These data have been included as a supplemental figure (Supplemental Fig. 5-2). We have also added several sentences to the Results section referencing this figure. Interestingly, in Kcnt1/Kcnt2-dKO neurons, VU170 seems to be quite specific across the negative potentials, as no outward currents are apparent until approximately -10 mV onward, whereas across positive potentials, there is a VU170-senstive outward current reaching ~1 nA by +50 mV. We have also included a note of caution in interpreting these data and added the possibility of off-target effects of VU170 as an alternative explanation for the differences observed on KNa currents between TTX and VU170 to the Discussion section.

      (5) The experiments were carried out at room temperature. Is it possible that different effects on firing patterns in heterozygotes and homozygotes would be observed at more physiological temperatures?

      Yes, it is reasonable to assume that an increased temperature would affect neuronal firing patterns in cultured neurons, as temperature differences have been shown to alter synaptic transmission and neuronal function, as assessed in both cultured neuron and slice recordings. All of our recordings were performed at room temperature in this study, and although they are valid with regard to between-group comparisons, this additional caveat is worth mentioning. We have added this to the paragraph describing study limitations in the Discussion section.

      To better understand the effects of temperature in our recordings, we have now compared membrane and AP generation parameters at room temperature (~22°C) and at a more physiological temperature (35°C) in a before-after study of 16 WT neurons, including both glutamatergic and GABAergic neuron types. Not surprisingly, we found robust alterations in all parameters assessed, excluding resting membrane potential and capacitance. We further assessed the effect of temperature on WT and YH-HET PV neurons, as the PV neurons expressing the YH variant showed the most unexpected phenotypes in our study. In our room temperature recordings, we showed that the YH-HET variant decreased the rheobase current, increased the AP amplitude, and increased the AP firing. In our before-after comparison (22°C-35°C) of PV neurons (WT; n=11, YH-Het; n=10), the WT and YH-HET neurons showed the same temperature-dependent effects on these parameters, including increased rheobase, decreased AP amplitude, and a higher maximal firing rate, at 35°C compared to those at 22°C. These data have been added to the manuscript as a supplemental figure (Supplemental Fig. 4-3) and are briefly referenced and described in the Results section.     

      Moreover, in our original manuscript, we showed that the effects of the homozygous YH variant on glutamatergic and NFS GABAergic neuron excitability were highly similar between cultured recordings at room temperature (~22°C) and slice recordings at 32°C. Taken together, these data suggest that the reported neurophysiological phenotypes downstream of the YH variant are likely not temperature dependent. 

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shore et al. investigate the consequent changes in excitability and synaptic efficacy of diverse neuronal populations in an animal model of juvenile epilepsy. Using electrophysiological patch-clamp recordings from dissociated neuronal cultures, the authors find diverging changes in two major populations of inhibitory cell types, namely somatostatin (SST)- and parvalbumin (PV)-positive interneurons, in mice expressing a variant of the KCNT1 potassium channel. They further suggest that the differential effects are due to a compensatory increase in the persistent sodium current in PV interneurons in pharmacological and in silico experiments.

      Strengths:

      (1) Heterozygous KCNT1 gain of function variant was used which more accurately models the human disorder.

      (2) The manuscript is clearly written, and the flow is easy to follow. The authors explicitly state the similarities and differences between the current findings and the previously published results in the homozygous KCNT1 gain of function variant.

      (3) This study uses a variety of approaches including patch clamp recording, in silico modeling, and pharmacology that together make the claims stronger.

      (4) Pharmacological experiments are fraught with off-target effects and thus it bolsters the authors' claims when multiple channel blockers (TTX and VU170) are used to reconstruct the sodium-activated potassium current. Having said that, it would be helpful to see the two drug manipulations used in the same experiment. Notably, does the more selective blocker VU170 mimic the results of TTX for NFS GABAergic cells in Figure 2? And does it unmask a genotype difference for FS GABAergic cells like the one seen in PV interneurons in Figure 5C3.

      To illustrate the two drug manipulations in the same experiment, we recorded from WT SST and PV neurons (5 neurons/group) and blocked KNa currents first using TTX and then VU170, following wash out between the two drugs, in the same neurons. Below, we have plotted the points at each voltage step for each SST and PV neuron, and for each drug treatment, on the same graph to show how they vary directly. At each voltage step, lines connect the points representing the TTX-sensitive and VU-sensitive currents for each neuron to show the individual effects (left-most graphs). Summary data are shown across all voltages (middle graphs) and across negative voltages (right-most graphs).

      Author response image 1.

      We have not used VU170 on FS and NFS populations of GABAergic neurons. However, for reasons that are explained more extensively below in response to Weaknesses #3, we would not predict KNa currents recorded from SST- and PV-GABAergic neurons to mimic those of NFS- and FS-GABAergic neurons, respectively.

      Weaknesses:

      (1) This study relies on recordings in dissociated cortical neurons. Although specific WT interneurons showed intrinsic membrane properties like those reported for acute brain slices, it is unclear whether the same will be true for those cells expressing KCNT1 variants. This reviewer highly recommends confirming some of the key findings using an ex vivo slice preparation. This is especially important given the discrepant result of reduced excitability of PV cells reported by Gertler et al., 2022 (cited here in the manuscript but not discussed in this context) in acute hippocampal slices for a different KCNT1 gain of function variant.

      We thank the reviewer for this suggestion. To test whether SST-expressing YH-HET neurons show similar impairments to those observed in culture, we crossed the FVB-Tg(GadGFP)45704Swn/J transgenic mouse line (Jackson Labs #003718), also known as the GIN line, to the Kcnt1-YH line. Mice from the GIN line express eGFP in a subpopulation of SST-expressing neurons in the hippocampus and cortex. We performed slice recordings of cortical layer 2/3, GFP-expressing neurons from P21-30, WT and YH-HET GIN mice. Although the input resistance was not significantly decreased, the rheobase was higher in the YH-HET neurons, and they fired fewer APs across increasing current steps, than WT neurons, supporting the main findings from the SST-expressing neurons in culture. These data have been added to the manuscript in a new figure (Fig. 9).

      Regarding the previously published results on the effect of KCNT1 GOF on PV neuron excitability by Gertler et al., we have written a new paragraph in the Discussion section (last paragraph of the section, “Neuron-type-dependent KCNT1 GOF effects”) that discusses the differences between the findings by Gertler et al. and the current study. 

      To further investigate the effects of heterozygous YH variant expression on SST- vs. PV-expressing neuron excitability in ex vivo slice recordings, we are now crossing a cre-inducible, Td-Tomato Red reporter line (Ai9) to the Kcnt1-YH line. After obtaining Ai9Tg/Tg; Kcnt1m/+ mice, we will cross these to Sst-Cre and Pvalb-Cre lines to be able to record from marked SST and PV, WT and YH-HET neurons in slice. We plan on submitting results from these recordings as an eLife Research Advances article linked to this article.

      (2) It is unclear how different pieces of results fit together to form a story about the disease pathophysiology.

      We have added a paragraph to the Discussion to speculate on how these various GABAergic subtype-specific effects downstream of the YH variant may contribute to overall network/brain pathology and seizure propensity in heterozygous mice.

      For example, hyperexcitability of PV cells would suggest more inhibition which would counter seizure propensity. However, spontaneous inhibitory postsynaptic currents show no change in pyramidal neurons. Moreover, how do the authors reconcile that the reductions in synaptic inputs onto interneurons in Figure 3B with the increases in Figure 8? This should be discussed.

      Generally, network and synaptic alterations downstream of the heterozygous variant were quite minimal compared with those of the homozygous variant. Although there were reductions in the frequency of synaptic inputs onto inhibitory neurons, the changes were relatively small. Thus, we concluded that the neuronal effects downstream of the heterozygous YH variant were below some threshold to result in broader network effects on synaptic activity and connectivity similar to those in the homozygous YH model. The discrepancies between our GABAergic vs. FS/NFS vs. VIP/SST/PV data will be discussed in more detail in response to Weakness #3.   

      (3) Similarly, the results in this work are not entirely internally consistent. For example, given the good correspondence between FS and NFS GABAergic cells with PV and SST expression, why are FS GABAergic cells hyperexcitable in Figure 1? If anything, there is a tendency to show reduced excitability like the NFS GABAergic cells.

      In our neuron cultures, 76-80% of Neu-N-expressing neurons are GFP+ (from the CamKII-eGFP virus used to mark glutamatergic neurons), and of the remaining ~20-24%, the majority are GABAergic (verified using the Dlx5/6-mRuby virus to mark GABAergic neurons and using electrophysiology to assess AP parameters and analyze evoked responses). In our original experiments, recordings sampled from this larger GABAergic population were used (Fig. 3), or this population was sorted almost equally into FS and NFS (Figs. 1 and 2).

      In later experiments, we isolated and cultured neurons from VIP-Cre, SST-Cre, and PV-Cre mouse lines and marked these neuron types in vitro with a Cre-inducible mCherry virus. In the VIP-Cre cultures, ~6% of the GFP- population, or 1.2% of the Neu-N-population, was mCherry+. In the SST-Cre cultures, ~20.5% of the GFP- population, or 4.7% of the Neu-N-population, was mCherry+. In the PV-Cre cultures, less than 1% of the Neu-N-population was mCherry+, which is not surprising considering the relatively late onset of PV expression compared with those of VIP and SST. Thus, we would estimate that we are marking and recording from less than 30% of the total GABAergic population in these in vitro experiments, rather than the 80-90% that these three populations would sum to in vivo.  

      Furthermore, using our original criteria for sorting GABAergic neurons into FS and NFS subtypes, all VIP recorded neurons were of the NFS type, PV of the FS type, whereas SST were of the FS (38%) and NFS (62%) types, which is not far off from the significant fraction of SST neurons that have been shown to be fast-spiking in slice experiments (Kvitsiani et al., 2013; Urban-Ciecko & Barth, 2016). Therefore, the FS group consists of both PV and SST neurons, and the NFS group consists of both VIP and SST neurons, and likely also contains immature PV neurons that have not yet developed a fast-spiking phenotype. Taken together, this suggests that the data from these two sets of experiments (FS/NFS vs. VIP/SST/PV) are not directly comparable.

      Also, why do the WT I-V curves look so different between Figures 2 and 5? This reviewer suggests at least a brief explanation in the discussion.

      As to the differences in appearance between the WT I-V curves in Figures 2 and 5, those plots are from different neuron types (Fig. 2: Glutamatergic, FS GABAergic, and NFS GABAergic vs. Fig. 5: VIP-, SST-, and PV-expressing), and the KNa currents are isolated using different methods (Fig. 2: TTX-subtraction vs. Fig. 5: VU170-subtraction). TTX blocks an inward Na+ current, which is apparent across subthreshold voltages in Fig. 2C1-3, whereas VU170 does not block this current, making it not apparent in Fig. 5C1-3. Also, the bottom three panels in Fig. 2C1-3 show the KNa current from -80 to 0 mV, whereas those in Fig. 5C1-3 show from -80 to -30 mV, to better illustrate the areas spanning KNa current increases, so their appearance is not directly comparable.

      (4) Given the authors' claim that the KCNT1 activation curve is a major contributor to the observed excitability differences in specific GABA cell subtypes, it would be helpful to directly measure the activation curve in the variants experimentally as was done for WT KCNT1 in Figure 6A and use the derived kinetics in the compartmental model.

      We apologize for the confusion. Although the activation curves among different GABAergic subtypes from WT KCNT1 are distinct, and we believe that these varying kinetics contribute to the neuron-type-specific phenotypes of KCNT1 GOF, we didn’t intend to suggest that the heterozygous Y777H variant itself causes neuron-type-specific alterations to the activation curves of the GABAergic subtypes. To clarify this point, below, we show the high similarity of the activation curves between WT KCNT1 and YH-HET KCNT1 in each of the GABAergic subtypes.

      Author response image 2.

      Reviewer #3 (Public Review):

      Summary:

      The present manuscript by Shore et al. entitled Reduced GABAergic Neuron Excitability, Altered Synaptic Connectivity, and Seizures in a KCNT1 Gain-of-Function Mouse Model of Childhood Epilepsy" describes in vitro and in silico results obtained in cortical neurons from mice carrying the KCNT1-Y777H gain-of-function (GOF) variant in the KCNT1 gene encoding for a subunit of the Na+-activated K+ (KNa) channel. This variant corresponds to the human Y796H variant found in a family with Autosomal Dominant Nocturnal Frontal lobe epilepsy. The occurrence of GOF variants in potassium channel encoding genes is well known, and among potential pathophysiological mechanisms, impaired inhibition has been documented as responsible for KCNT1-related DEEs. Therefore, building on a previous study by the same group performed in homozygous KI animals, and considering that the largest majority of pathogenic KCNT1 variants in humans occur in heterozygosis, the Authors have investigated the effects of heterozygous Kcnt1-Y777H expression on KNa currents and neuronal physiology among cortical glutamatergic and the 3 main classes of GABAergic neurons, namely those expressing vasoactive intestinal polypeptide (VIP), somatostatin (SST), and parvalbumin (PV), crossing KCNT1-Y777H mice with PV-, SST- and PV-cre mouse lines, and recording from GABAergic neurons identified by their expression of mCherry (but negative for GFP used to mark excitatory neurons).

      The results obtained revealed heterogeneous effects of the variant on KNa and action potential firing rates in distinct neuronal subpopulations, ranging from no change (glutamatergic and VIP GABAergic) to decreased excitability (SST GABAergic) to increased excitability (PV GABAergic). In particular, modelling and in vitro data revealed that an increase in persistent Na current occurring in PV neurons was sufficient to overcome the effects of KCNT1 GOF and cause an overall increase in AP generation.

      Strengths:

      The paper is very well written, the results clearly presented and interpreted, and the discussion focuses on the most relevant points.

      The recordings performed in distinct neuronal subpopulations are a clear strength of the paper. The finding that the same variant can cause opposite effects and trigger specific homeostatic mechanisms in distinct neuronal populations is very relevant for the field, as it narrows the existing gap between experimental models and clinical evidence.

      Weaknesses:

      My main concern is in the epileptic phenotype of the heterozygous mice investigated. In fact, in their previous paper the Authors state that "...Kcnt1-Y777H heterozygous mice did not exhibit any detectable epileptiform activity" (first sentence on page 4). However, in the present manuscript, they indicate twice in the discussion section that these mice exhibit "infrequent seizures". This relevant difference needs to be clarified to correctly attribute to the novel pathophysiological mechanism a role in seizure occurrence. Were such infrequent seizures clearly identified on the EEG, or were behavioral seizures? Could the authors quantify this "infrequent" value? This is crucial also to place in the proper perspective the Discussion statement regarding "... the increased INaP contribution to ... network hyperexcitability and seizures".

      We apologize for the confusion. Indeed, in the Results section from our previous paper, we failed to observe seizures in 14 heterozygous mice, whereas 23/25 homozygous mice showed seizures by video-EEG. However, in the fifth paragraph of the Discussion section from that paper, we further stated that “during the preparation and review of [that] article, we observed seizures in two Kcnt1-Y777H heterozygous mice, one during a widefield Ca2+ imaging experiment and the other during a video-EEG experiment”. Thus, we concluded that “heterozygous expression can result in seizures in a rodent model, but apparently at a much lower frequency than that observed with homozygous expression”. To emphasize these findings, we’ve added a sentence to the Introduction in this manuscript about the occurrence of infrequent seizures in Kcnt1-Y777H heterozygous mice, along with a reference to the Discussion of our previous paper.

      Of the two observed seizures, one seizure was captured in the Weston Lab at the University of Vermont from a Kcnt1-Y777H heterozygous mouse expressing a calcium indicator (after it was bred to the Snap25-GCaMP6s line) during a Ca2+ widefield imaging experiment, and it was accompanied by a time-locked video of the seizure event. The other seizure was recorded as a control during a drug study using video-EEG. This Kcnt1-Y777H heterozygous mouse had multiple tonic seizures, as evidenced by EEG traces and the accompanying video, which were recorded and analyzed in the Frankel Lab at Columbia University. The seizures from heterozygous mice have not been officially quantified, as they have only been rarely observed across multiple different experiments using heterozygous mice at multiple institutions, making quantification quite difficult.

      Lastly, regarding attributing the role of the identified pathological mechanisms to seizure occurrence mentioned by the reviewer, we have added a paragraph to the Discussion to speculate on how the various GABAergic subtype-specific effects downstream of the YH variant may contribute to the general lack of network/brain pathology and seizure generation in heterozygous mice.  

      Also, some statistical analysis seems to be missing. For example, I could not find any for the data shown in Fig. 6. Thus, the following statement: "the model PV neurons responded to KCNT1 GOF with decreased AP firing and an increased rheobase" requires proper statistical evaluation.

      We thank the reviewer for this suggestion. We were initially hesitant to apply a formal statistical analysis to the modeling data because it differs in important ways from the experimental data. However, we have now provided statistical analyses of these data, with some caveats. Because we applied each KCNT1 GOF level (40, 35, and 30 mM) to the same set of neurons for each data set, we performed repeated measures ANOVA analyses to assess differences due to GOF in each subtype. We note that some changes are statistically significant, but may not be physiologically relevant. For example, there are changes in input resistance and rheobase in VIP neurons only at the higher GOF level (30 mM), but the magnitude of each change is quite small relative to those in SST neurons (Rin: 1.7 MΩ in VIP vs. 23.2 MΩ in SST, rheo: 1.7 pA in VIP vs. 52.5 pA in SST), and likely as a consequence, there are no downstream effects on the AP firing rate at either GOF level in VIP neurons. It is important to examine the magnitude of the effects and interpret them in the context of the changes in other neuron types and in the experimental data, thus, we’ve provided two new figures to better illustrate the relative changes in each neuron type (Supplemental Figures 6-2 and 7-2). We have also added these statistical results to Figures 6E2, 6F2, 6G2, and 7E, and Supplemental Fig. 6-1, and we have described them in the Results section. A summary of the statistics has also been added in Supplemental Table 6.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In addition to addressing the weaknesses highlighted in the public review, this reviewer recommends using a KCNT1 agonist such as loxapine to see if activating the potassium channel mimics the KCNT1 GOF in SST and PV cells.

      Although we appreciate this suggestion, we’re not sure whether treating GABAergic subtypes with loxapine would provide much clarity in the absence of many supporting experiments. First, the amount of channel activation and any changes in kinetics caused by loxapine would need to be measured and compared to the YH-HET GOF effects in order to interpret the results. In addition, the aforementioned caveat about off-target effects of small molecules would also have to be considered, as loxapine inhibits many other channels at nanomolar concentrations.

      More importantly, we hypothesize that several of the GABAergic subtype-specific effects of KCNT1 GOF result from homeostatic or adaptive mechanisms due to long-term increases in KNa currents. For instance, PV-expressing YH-HET neurons had a lower rheobase, increased AP amplitude, and increased AP firing frequency, effects that we believe are due, not to increased KNa currents themselves, but to a compensatory increase in a persistent Na+ current. For the SST neurons, we hypothesize that the increased capacitance and soma size, together with the increased electrical coupling, exacerbate the hypoexcitability phenotype downstream of the YH variant. Thus, we would not necessarily expect that opening KCNT1 channels by acute loxapine treatment would mimic many of these effects.

      Indeed, in a previous study using a different KCNT1 GOF mouse model, loxapine treatment mimics KCNT1 GOF effects in some neuron types (reduced AP firing frequency in loxapine-treated, WT PV neurons mimics that observed in heterozygous KCNT1 GOF PV neurons), but not in others (reduced AP firing frequency in loxapine-treated, WT pyramidal neurons does not mimic the unaltered AP firing frequency observed in heterozygous and homozygous KCNT1 GOF pyramidal neurons) (Gertler et al., 2022).  

      Related to this suggestion by the reviewer, we are currently performing studies using a KCNT1 blocker in WT and Kcnt1-KO neurons to better understand the role of KCNT1 among cortical neuronal subtypes that will be published in a future manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Though I realize that primary cultures allow for efficient identification of neuronal subclasses, it would have been useful to show that similar changes also occur in neurons with conserved in vivo connectivity, such as those recorded from brain slices.

      We thank the reviewer for this suggestion. We have added an additional figure (Fig. 9) showing that the hypoexcitability phenotype observed in SST neurons in culture recordings is conserved in SST neurons in slice recordings from GIN mice, which express GFP predominately in SST-expressing neurons.

      In addition, further experiments in PV neurons from Kcnt1-Y777H homozygous mice would provide evidence for a gene-dosage role in the changes found in heteros.

      For this manuscript, we chose to focus our efforts on understanding the effects of heterozygous Kcnt1 variant expression in various neuronal subtypes with the goal of better modeling GOF variant effects in human disease. However, we’re very interested in investigating the effects of homozygous expression of the YH variant on each of the GABAergic subtypes to compare with those found in this study, but this requires more rounds of breeding to get homozygous mice with GABAergic subtype-specific expression of cre recombinase. We look forward to reporting the results from these experiments in a future manuscript.

      Also, when addressing the issue regarding the different effects of the same GOF variant on the excitability of distinct neuronal populations in the Discussion or Introduction sections, the authors may want to cite the recent work on KCNQ2 and KCNQ3 by the Tzingounis group (https://pubmed.ncbi.nlm.nih.gov/37607817/).

      We thank the reviewer for bringing this manuscript to our attention. We have added this citation to a new paragraph in the Discussion section regarding neuron-type specific effects of ion channel variants (the last paragraph focusing on the effects in PV neurons).

      Budelli, G., Hage, T. A., Wei, A., Rojas, P., Jong, Y. J., O'Malley, K., & Salkoff, L. (2009). Na+-activated K+ channels express a large delayed outward current in neurons during normal physiology. Nat Neurosci, 12(6), 745-750. https://doi.org/10.1038/nn.2313

      Gertler, T. S., Cherian, S., DeKeyser, J. M., Kearney, J. A., & George, A. L., Jr. (2022). K(Na)1.1 gain-of-function preferentially dampens excitability of murine parvalbumin-positive interneurons. Neurobiol Dis, 168, 105713. https://doi.org/10.1016/j.nbd.2022.105713

      Grubb, M. S., & Burrone, J. (2010). Activity-dependent relocation of the axon initial segment fine-tunes neuronal excitability. Nature, 465(7301), 1070-1074. https://doi.org/10.1038/nature09160

      Hage, T. A., & Salkoff, L. (2012). Sodium-activated potassium channels are functionally coupled to persistent sodium currents. J Neurosci, 32(8), 2714-2721. https://doi.org/10.1523/JNEUROSCI.5088-11.2012

      Kuba, H., Oichi, Y., & Ohmori, H. (2010). Presynaptic activity regulates Na(+) channel distribution at the axon initial segment. Nature, 465(7301), 1075-1078. https://doi.org/10.1038/nature09087

      Kvitsiani, D., Ranade, S., Hangya, B., Taniguchi, H., Huang, J. Z., & Kepecs, A. (2013). Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature, 498(7454), 363-366. https://doi.org/10.1038/nature12176

      Shore, A. N., Colombo, S., Tobin, W. F., Petri, S., Cullen, E. R., Dominguez, S., Bostick, C. D., Beaumont, M. A., Williams, D., Khodagholy, D., Yang, M., Lutz, C. M., Peng, Y., Gelinas, J. N., Goldstein, D. B., Boland, M. J., Frankel, W. N., & Weston, M. C. (2020). Reduced GABAergic neuron excitability, altered synaptic connectivity, and seizures in a KCNT1 gain-of-function mouse model of childhood epilepsy. Cell Rep.

      Urban-Ciecko, J., & Barth, A. L. (2016). Somatostatin-expressing neurons in cortical networks. Nat Rev Neurosci, 17(7), 401-409. https://doi.org/10.1038/nrn.2016.53

    1. Author response:

      We thank the reviewers for their constructive comments that will help us clarify and strengthen the paper. We will be happy to address all the comments and adjust the text accordingly. Regarding the suggestion in the assessment to include a “more thorough comparison with with human behavior”, we believe this comment reflects one of the reviewer’s comments to compare with order effects (primacy and recency); we did not see any other comments that would reflect this (our existing simulations do make contact with other human behavior regarding error distributions, including probability of recall, precision, sensitivity to reinforcement history, and dopamine manipulation effects on human WM). We thank the reviewers for this comment and we will conduct the appropriate simulations and analysis to compare with sequential effects in working memory.

    1. Author response:

      Reviewer #1 (Recommendations For The Authors): 

      This paper represents a huge amount of work on a condition whose patients' health and well-being have not always been prioritized, and only relatively recently has the immune dysregulation seen in patients with Down Syndrome (DS) been garnering major research interest. 

      This paper provides an unparalleled examination of immune disorder in patients with DS. In a truly herculean effort, the authors provided the cumulative examination of over 440 patients with DS, confirmed the alterations in immune cell subsets (n=292, 96 controls) and multi-organ autoimmunity seen in these patients as they age, and identified autoantibody production that could contribute to conditions co-occurring in patients with DS. They also sought to look at whether the early immunosenescence seen in DS was due to the inflammatory profile by comparing age-associated markers in DS patients and euploid controls separately, finding that several markers are regulated with age regardless of group, while comparing the effect of age versus DS status on cytokine status identified inflammatory markers elevated in DS patients across the lifespan that do not increase with age or that increase with age only in the DS cohort. This is very interesting in the context of DS in particular, and immunity during aging in general. 

      The second part of the manuscript presents the results from a clinical trial with the JAK inhibitor tofacitinib in DS patients. While the number of DS patients treated with tofacitinib was small, the results were often quite striking. Treatment was well-tolerated and the improvement of dermatological conditions was clear. The less responsive patients AA4 and AA2 provide a very clear illustration that these patients are sensitive to immune triggers during treatment. Additionally, the demonstration that patients' IFN scores and cytokine levels decreased without clear immunosuppression with tofacitinib treatment is encouraging, since treatment with this drug would need to be continuous. I would be curious to see if the patients added past the cutoff for interim analysis follow a similar trajectory. I would not ask the authors to add any data; the paper is well-written and logically constructed. 

      I only have a small comment: I really did not like how Figure 2 a, d, and g tethered the coloring to the magnitude of fold change to show the effect of DS particularly for 2a and 2g. Given that these fold changes are quite modest, the coloring is very light and hard to distinguish. The clear takeaway is that the effect on T cells is greatest, but there must be a better way to illustrate this. Perhaps displaying this graph on a non-white background could help with contrast. 

      We are grateful for the Reviewer’s very positive assessment of the manuscript and constructive feedback. We want to assure the Reviewer that similar analyses will be completed in the future for the entire cohort recruited into the trial to determine if similar trajectories and results are observed with the larger sample size. Additionally, following Reviewer’s guidance, we will explore alternative ways to present the data in Figure 2 for greater clarity in a revised version of the manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      • Although the focus of the patients in the first part of the paper is on autoimmune/inflammatory conditions, it will be useful to also list the non-autoimmune infectious manifestations for reference with prevalence data. For example, otitis media, or lung infections (mentioned within the paper), or mucosal candidiasis. Same for other manifestations such as cardiac or malignant conditions. Given the impressive number of patients, it will be useful to the readers to have prevalence data for these as well, even in brief statements within the results. 

      We appreciate this inquiry by the Reviewer and will present additional data on the co-occurring conditions mentioned by the Reviewer in a revised version of the manuscript.

      • Have the authors looked at DN T cells and whether they may be enriched in DS patients, given their enrichment in some autoimmune conditions? 

      Thanks for this inquiry. We did examine DN T cells (double negative T cells), which we referred to in our Figure 2 and Figure 2 – figure supplement 1 as non-CD4+ CD8+ T cells. Although this T cell subset is mildly elevated (in terms of frequency among T cells) in individuals with Down syndrome, the result did not reach statistical significance after multiple hypothesis correction. This negative result is shown in the heatmap in Figure 2 – figure supplement 1d.

      • It would be useful to move the segment of the discussion that discusses the interim predefined analysis of the phase 2 trial to the corresponding segment of the results. As this reviewer was reading the paper, it was unclear why the interim analysis was done, whether it was predefined and it was not until the discussion that it became apparent. I believe it will help the readers to have a brief mention that this interim analysis was predefined and set to occur at the first 10 DS enrollees. Also, it would be helpful to state what is the total number of DS patients planned for enrollment in the Phase 2 trial which is continuing recruitment. 

      We appreciate this comment and will modify the text following the Reviewer’s guidance in the revised manuscript. The trial will be considered complete once a total of 40 participants undergo 16-weeks of treatment with good medicine compliance (less that 15% missed doses).

      • Although the authors present data on TPO autoantibodies before and after tofacitinib, it remains unclear whether the other non-TPO autoantibodies were altered during treatment or whether this was a TPO autoantibody-specific phenomenon. Was there an alteration in mature B cells or plasmablast populations after tofacitinib? If these data are available, they would further enhance the manuscript. If they are not available, it would be useful for the authors to discuss those in the discussion of the manuscript. 

      We are grateful for this comment, which strongly aligns with our future research interests and plans for the analysis of the full cohort once the trial is completed. In the interim analysis, we analyzed only auto-antibodies related to autoimmune thyroid disease and celiac disease, as shown in the manuscript. However, we plan to complete a more comprehensive analysis of the effects of JAK inhibition on autoantibody production once the full sample set is available at the end of the trial. Likewise, the clinical trial protocol contemplates collection and processing of blood samples for immune mapping using mass cytometry, which will enable us to answer the question from the Reviewer about potential changes in B cells or plasmablasts populations. Following Reviewer’s guidance, we will discuss these planned analyses in the Discussion of the revised manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Cellular immune phenotyping data in Figure 2 presents a large number of patients with DS versus euploid controls (292 and 96 respectively). Given the relatively large cohort there would seem to be an opportunity to determine whether age or sex alters the immune phenotype shown, for example, TEMRAs, etc. Was the data analyzed in this way? 

      We welcome this comment, which clearly aligns with our research interests and planned additional analyses of these datasets generated by the Human Trisome Project. We can share with the Reviewer that although sex as a biological variable has minimal impacts on the strong immune dysregulation observed in Down syndrome, there are clear age-dependent effects, with some immune changes occurring early during childhood versus others taking place later in adult life. A manuscript describing a complete analysis of age-dependent effects on the multi-omics datasets in the Human Trisome Project is currently under preparation.

      (2) The authors should strongly consider incorporating/discussing the findings from Gansa et al, Journal of Clinical Immunology May 2024 - where they reviewed the immune phenotype of 1299 patients with Down syndrome. 

      Thanks for this suggestion, we will surely cite and discuss this recent paper in the revised manuscript.

      (3) It is difficult to differentiate patients Hs2 and Ps1 in Figure 5d. 

      Thanks for this observation, we will modify the labels for greater clarity in the revised manuscript.

      (4) Given their finding of no correlation between cytokine levels/immune phenotype and autoimmunity, some additional discussion of the relevance of hypercytokinemia in the pathogenesis of autoimmunity would seem relevant (given that this was the basis for the clinical trial). The authors mention that cytokine levels may not be appropriate measures of disease in the patients. 

      We welcome this opportunity to expand the discussion of the relevance of hypercytokinemia in the pathogenesis of autoimmunity and will do so in the revised manuscript.

      (5) Data availability statement: appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors): 

      The authors should perform experiments to answer this question: does Cav3 transcription increase in the G369i-KI, or is there instead some post-transcriptional modulation that permits surface expression of functional Cav3-containing channels in the absence of typical HVA Ca conductances? Also, the authors should determine whether G369i-KI can mediate Ca2+ release from intracellular stores and whether release from stores is upregulated as Cav3-containing channel expression (or function) is increased. 

      We performed transcriptomic (drop-seq) analysis to test whether a Cav3 subtype is upregulated in cones of G369i KI mice. These experiments show that, consistent with previous studies (PMID 35803735, 26000488), Cacna1h appears to be the primary Cav3 subtype expressed mouse cones. However, as shown in new Supp.Fig.S3, there was no significant difference in the levels of Cacna1h transcripts in WT and G369i KI cones. Therefore, we propose that there may be some post-transcriptional modification, or alteration in a pathway that regulates channel availability, that enables the contribution Cav3 channels to the whole-cell Ca2+ current in the absence of functional Cav1.4 channels cones.

      We also performed Ca2+ imaging experiments in WT vs G369i KI cone terminals to assess whether the diminutive Cav3 current in G369i KI cone terminals may be compensated by upregulation of a Ca2+ signal such as from intracellular stores. Arguing against this possibility, depolarization-evoked Ca2+ signals in G369i KI cones were dramatically reduced compared to WT cones (new Fig.9). 

      Reviewer #2 (Recommendations For The Authors): 

      Major points- 

      (1) It is stated in too many places that cone features in the Cav1.4 knock-in are "intact", preserved, or spared, but this representation is not accurate. There are two instances in this study that qualify as intact when comparing KI to WT: 1) the photopic a-waves in the Cav1.4 knock-in (also demonstrated in Maddox et al 2020) and 2) latency to the platform (current MS, Figure 7f). However, in the numerous instances listed below, the authors compared the Cav1.4 knock-in to the Cav1.4 knock-out, and then referred to the KI as exhibiting intact responses. The reference point for intactness needs to be wildtype, as appropriately done for Figures 2 and 3, and when comparing the KI to the KO the phrasing should be altered; for example: "the KI was spared from the extensive degeneration witnessed in the KO....". 

      In most cases, we clearly note that there are key differences in the WT and the G369i KI cone synapses, which highlight the importance of Cav1.4-specific Ca2+ signals for certain aspects of the cone synapse. We disagree with the reviewer on the point that we did not often use the WT as a reference since most of our experiments involved comparisons of only WT and G369i KI (Figs. 3-6) or WT, G369i KI, and Cav1.4 KO (Figs.1,7—and in these cases comparisons specifically between WT and G369i KI mice were included). We used “intact” as a descriptor for G369i KI cone synapses since these are actually present, albeit abnormal in the G369i KI retina, whereas cone synapses are completely absent in the Cav1.4 KO retina. To avoid confusion, we modified our use of “intact” and “preserved” where appropriate.

      A. Abstract, line 34 to 35: ".......preserved in KI but not in KO.". 

      Abstract was rewritten and this line was removed.

      B. Line 36: "....synaptogenesis remains intact". The MS documents many differences in the morphology of KI and WT cones (immunofluorescence and electron microscopy data), which is counter to an intact phenotype. 

      The sentence was: “In CSNB2, we propose that Cav3 channels maintain cone synaptic output provided that the Ca2+-independent role of Cav1.4 in cone synaptogenesis remains intact.”

      Here the meaning of “intact” refers to the Ca2+ -independent role of Cav1.4, not synapses. Thus, we have left the sentence unchanged.

      C. This strikes the right balance, lines 67 to 68: "....although greatly impaired.....". 

      D. Line 149, "Cone signaling to a postsynaptic partner is intact in G369i KI mice". This description is inaccurate. Here there is only WT and KI, and the text reads as follows in line 162: "terminals (Figure 6b). The ON and OFF components of EPSCs in G369i KI HCs were measurable, although lower in amplitude than in WT (Figure 6a,b)." Neither "measurable" nor "lower in amplitude" meet the definition of "intact", and actual numerical values are lacking in the text. 

      We have added results showing that there are no light responses in the Cav1.4 KO horizontal cells and have modified the sentence to: “Cone synaptic responses are present in horizontal cells of G369i KI but not Cav1.4 KO mice”. 

      We have modified discussion of these results as (line 210-213): “Consistent with the lack of mature ribbons and abnormal cone pedicles (Fig.1), HC light responses were negligible in Cav1.4 KO mice (Fig.8a,b). In contrast, the ON and OFF responses were present in G369i KI HCs although significantly lower in amplitude than in WT HCs (Fig. 8a,b).”

      E. Please add a legend to Figure 6a to indicate the intensities. The shape of the KI responses is different from the control which is worthy of discussion: i) there is no clear cessation of HC EPSCs in the KI during the light ON period (when release stops, Im fluctuations should be minimal), and ii) the "peaked" appearances of the initial 500ms of the On and Off periods are very similar in shape for the KI (hard to interpret in the same fashion as a control response). How were the On and Off amplitudes analyzed? Furthermore, the OFF current is not summarized in Figure 6D, but should not this be when Cav3 should be opening and triggering release: Off response-EPSC? Lastly, Figure 6b,d shows a ~70% reduction in On-current in the KI, and the KI example of 6b an 80% reduction in Off current compared to WT. Yet, the only place asterisks are used to indicate sig diff is the DNQX data within each genotype in Fig 6d. These data cannot be described as showing "intact" KI responses, and the absence of numerical and statistical values needs to be addressed. 

      New Fig.8a depicting the horizontal cell light responses has been modified to include the legend indicating light intensities. The ON and OFF amplitudes were analyzed as the peak current amplitudes. This information has been added to the legend.

      The reviewer is correct in that the OFF response represents the EPSC whereas the ON response represents the decrease in the EPSC with light. To avoid confusion, we changed the y axis label for the averaged data to read ON or OFF “response” rather than “current” in new Fig.8b.

      As the reviewer suggests, the more transient nature of the KI response during the light ON period could result from aberrant continuation of vesicular release during the light-induced hyperpolarization of cones in the KI mice, in contrast to the prolonged suppression of release by light which is evident in the WT responses. We speculated on this difference as follows (lines 237-241):

      “In addition to its smaller amplitude, the transient nature of the ON response in G369i KI HCs suggested inadequate cessation of cone glutamate release by light (Fig.8b). Slow deactivation of Cav3 channels and/or their activation at negative voltages20 could give rise to Ca2+ signals that support release following light-induced hyperpolarization of G369i KI cones.”

      We added astericks to new Fig.8b,d indicating statistical differences and description of the tests in the legend.

      F. line 168 the section titled "Light responses of bipolar cells and visual behavior is spared in G369i KI but not Cav1.4 KO mice". 

      Changed to: “Light responses of bipolar cells and visual behavior are present in G369i KI but not Cav1.4 KO mice”

      Last sentence of erg results, 189-190: "These results suggest that cone-to-CBC signaling is intact in G369i KI mice.". "Spared and intact" are not accurate descriptions. The ERG data presented here shows massive differences between WT and the KI, except in the instance of awaves. 

      This sentence was removed.

      As for Figure 6, the results text related to Figure 7a-d does not present real numbers for ERG responses, and there is no indication of significant differences there or in the Figure panels. For instance, in Figure 7b, b-waves are KI are comparable to KO, except at the two highest-intensity flashes that show KI responses ~20% the amplitude of WT. Presentation of KI and KO data on a 6- to 10-fold expanded scale higher than WT can be misleading: a quick read of these Figure panels might make one incorrectly conclude that the KI is intact while the KO is impaired when compared to WT. The Methods section needs more details on the ERG analysis (e.g. any filtering out of oscillatory potentials when measuring b-wave, and what was the allowable range of time-to-peak for b-wave amplitude, etc..). 

      The vertical scaling of the ERG results in new Fig.10c,d has been changed so as to reflect clearly diminished responses of the KO and KI vs the WT. Further details regarding the ERG analysis was added to the Methods section.

      G. Can you point to other studies that have used the "visible platform swim test" used in Figure 7e, f, and specify further how mice were dark/light adapted prior to the recordings? 

      As referenced in the Methods, original line 674, the methods we used for the swim test were described in our previous study (PMID 29875267). Other studies that have used this assay include PMIDs: 28262416, 26402607.

      (2) The Maddox et al 2020 study does not safely address whether rods have a residual T-type Ca2+ current in the Cav 1.4 KO or KI. The study showed that membrane currents measured from rods in the KI and KO retina were distinct from WT, supporting their claim that L-type Ca2+ current is absent in the KI and KO. However, the recordings had shortcomings that challenge the analysis of Ca2+ currents: i) collected at room temp (22-24{degree sign}C), ii) at an unknown distance from the terminal (uncertain voltage clamp), iii) with a very slow voltage ramp rate that is not suitable for probing T-type currents (Figure 1d Maddox 2020, 140 mV over 1 sec: 7msec/1mV), and iv) at a signal-to-noise that does not allow to resolve a membrane current under 1 pA (avg wt rod Ca2+ current was -3.5 pA, and line noise ~1pA peak-to-peak in Maddox 2020). Suggestion: say T-type currents were not probed in Maddox et al 2020, but Davison et al 2022 did not find PCR signal for Cav3.2 in rods. 

      We disagree that recordings in the Maddox 2020 study were not sufficient to uncover a T-type current. The voltage ramps in that study were not much slower than that of the Davison et al. 2022 study (they used 0.19 mV/ms). Moreover, in new Supp. Fig.S1, we show that like the slower voltage ramp (0.15 mV/ms) used in the prior study of G369i KI rods, the voltage ramps we used in the present study (0.5 mV/ms), which clearly evoke currents with T-type properties in G369i KI cones (Fig.2a,b, Fig.3a,b) do not evoke currents in WT or G369i KI rods.  

      Minor comments. 

      (1) Suggestion: add an overview panel to Figure 1 that shows the rod terminals in the KI. The problem is that cropping out the ribbon and active zone signals from rods, to highlight cones, can give the impression that the cones are partially spared in the KI, and the rods are not spared at all. (yet you nicely clarify this in Figure 4 and in the legend and text, etc.). 

      We chose to modify the legend with this information as in Fig.4 rather than modify the figure.

      (2) Mouse wt cone Ca2+ currents look like L-type currents, as do your monkey and squirrel cone recordings, and also much like those of mouse rods (see Figure S5, Hagiwara et al., 2018 or Grabner and Moser 2021). Your pharm data from mice and squirrels further supports your conclusion, and certainly took much effort. Davison et al 2022 J Neurosci showed PCR results that support their claim that a Cav3 current exists in wt cones. Questions: 1) have you tried PCR? 2) Can you offer more details on what Cav3 KO you tried and what antibodies failed to confirm the KO? As the authors know, one complication is that the deletion of one Cav can be compensated for by the expression of a new Cav. There are 3 types of Cav3s and removal of one type may be compensated for by another Cav3. 

      We have included drop-seq data (new Supp.Fig.S3) implicating Cav3.2 as the main Cav3 subtype in cones and have modified our discussion of these results accordingly. These experiments did not reveal any changes in Cav3 subtype expression in G369i KI vs WT cones.

      (3) Lines 95/96- onward, spend more time telling the story. When working out the biophysical and pharmacological behavior of the Ca2+ currents, you might want to initially refer to the membrane current as a membrane current, and then state how your voltage protocols, intra- and extra-cell solutions, and drugs helped you verify 1) L-type and 2) T-type Ca2+ currents. 

      We have modified the text with more detail.

      (4) If data is in hand, add a ramp I-V to Figure S2, which shows the response of the ground squirrel cone. The steps in S2a are excellent for making your point that a transient current is missing, and the bipolar is a great control to illustrate ML218 works. However, a comparison of a squirrel cone ramp to a bipolar ramp response could complete the figure. 

      See Reponse to #5 below.

      (5) Consider moving Supplementary Figures S2 and S3 to the main text; these are highly relevant to the story, novel, and well-executed. 

      Fig.S2 and S3 were added as new Figs.4,5. The new Fig.4 includes voltage ramps in ground squirrel cones (panel a) to compare with the bipolar data (panel f).

      (6) The nice electron microscopy reconstructions are not elaborated on in any detail, and there is no mention of ribbon size. Is the resolution sufficient to estimate ribbon size, the number of synaptic vesicles around the ribbon and in the adjacent cytosol? The images indicate major changes in the morphology of the terminals. Is the glial envelope similar in WT and KI? 

      Since ribbons were quantified extensively in the confocal analyses in Fig.6, we felt it unnecessary to add this to the EM analysis which focused mainly on aspects of 3D structure (i.e., arrangement of ribbons, postsynaptic wiring, cone pedicle morphology). We added further discussion of the change in morphology of the G369i KI cone pedicle (lines 200-203): “Compared to WT, ribbons in G369i KI pedicles appeared disorganized and were often parallel rather than perpendicular to the presynaptic membrane (Fig.7a-c). Consistent with our confocal analyses (Fig.1), G369i KI cone pedicles extended telodendria in multiple directions rather than just apically (Fig. 7a).”

      While we did not opt to characterize the glial envelope in WT cones, we did add an analysis of synaptic vesicles around ribbons to Table 2.

      (7) Discussion line 250: "we found no evidence for a functional contribution of Cav3 in our recordings of cones in WT mice (Figures. 2,3), ground squirrels, or macaque (Supplementary Figures S2 and S3).". I would not use "functional" in this context because when comparing your work to Davison et al 2022, they defined functional as a separate response component driven by Cav3. For instance, they examined the influence of their T-type current on exocytosis (by membrane capacitance) and other features like spiking Ca2+ transients. Suggestion: substitute functional with "detectable", and say "we found no detectable Cav currents". Or if you had Ttype staining, but not T-type Ca2+ currents, then say "no functional current even though there is staining...". 

      We have modified the text as (lines 336-338): “However, in contrast to recordings of WT mouse cone pedicles in a previous study21, we found no evidence for Cav3-mediated currents in somatic recordings of cones in WT mice (Figs.2,3).”

      We propose an alternative interpretation of the results in the Davison et al study concerning the conclusion that Cav3 channels contribute to Ca2+ spikes and exocytosis. That study used 100 µM Ni2+ to block a “T-type” contribution to spike activity in cones. In their Figs.4,5, the spikes are suppressed by 100 µM Ni2+ and 10 µM nifedipine, a Cav1 antagonist, and spared by the T-type selective drug Z944. This is problematic for several reasons. First, as shown by the authors

      (their Fig.2A1,A2) and others (PMID: 15541900), 100 µM Ni2+ inhibits Cav1-type currents in photoreceptors. Second, Z944 potentiates Cav1 current in their mouse cones (their Fig.2C1,C2). Thus, both reagents are suboptimal for dissecting the contribution of either Cav subtype to spiking activity. With respect to Cav3 channels and exocytosis, these authors interpreted a reduction in exocytosis upon holding at -39 mV compared to at -69 mV as indicating a loss of a T-type driven component of release. However, Cav1 channel inactivation (PMID: 12473074) could lead to the observed reduction in exocytosis at -30 mV.

      (8) Additional literature related to your Intro and Discussion. Regarding CSNB2, related mutations of active zone proteins, and what happens to Ca2+ currents when ribbons are deleted, you might want to consider the following studies that measure Ca2+ currents from rods: conditional KO of RIM1/2 (Grabner et al 2015 JN), KO of ELKS1/2 (Hagiwara et al, 2018 JCB), and KO of Ribeye (Grabner and Moser eLife 2021). In these studies, the Cav currents were absent in rods of the ELKS1/2 DKO, strongly reduced (80%) in the RIM1/2DKO, but altered in more subtle ways (activation-inactivation) without significantly changing steady-state Ca2+ current in the Ribeye KO. This does not seem to support some of the arguments you have made in the Introduction and Discussion regarding ribbon size and Ca2+ currents, yet the suggested literature is related to the topic at hand. 

      A description of these synaptic proteins as potential mediators of the effect of Cav1.4 on ribbon morphogenesis was added to the Discussion, lines 325-327.

      (9) Line 129: "Along with the major constituents of the ribbon, CtBP2, and RIBEYE", for clarity Ribeye has two domains, one that is identical to CtBP2 (B-domain) and the unique Ribeye domain (A-domain) that is only expressed at ribbon synapses. And, Piccolino is also embedded in the ribbon (Brandstaetter lab, Wichmann/Moser labs). In other words, Ribeye and Piccolino are the major constituents of the ribbon. 

      To avoid confusion, we simply mention Ctbp2 and RIBEYE in the context of the corresponding antibodies that were used to label ribbons.

      (10) Abstract: consider to rephrase "Ca2+-independent role of Cav1.4" by "Ca2+-permeationindependent role of Cav1.4" or alike 

      Sentence changed to: “In CSNB2, we propose that Cav3 channels maintain cone synaptic output provided that the nonconducting role of Cav1.4 in cone synaptogenesis remains intact.”

      Reviewer #3 (Recommendations For The Authors): 

      Cav1.4 voltage-gated calcium channels play an important role in neurotransmission at mammalian photoreceptor synapses. Mutations in the CACNA1f gene lead to congenital stationary night blindness that particularly affects the rod pathway. Mouse Cav1.4 knockout and Cav1.4 knockin models suggest that Cav1.4 is also important for the cone pathway. Deletion of Cav1.4 in the knockout models leads to signaling malfunctions and to abundant morphological re-arrangements of the synapse suggesting that the channel not only has a role in the influx of Ca2+ but also in the morphological organization of the photoreceptor synapse. Of note, also additional Cav-channels have been previously detected in cone synapses by different groups, including L-type Cav1.3 (Wu et al., 2007; pmid; Kersten et al., 2020; pmid), and also T-type Cav3.2 (Davison et al., 2021; pmid 35803735). 

      In order to study a conductivity-independent role of Cav1.4 in the morphological organization of photoreceptor synapses, the authors generated the knockin (KI) mouse Cav1.4 G369i in a previous study (Maddox et al., eLife 2020; pmid 32940604). The Cav1.4 G369i KI channel no longer works as a Ca2+-conducting channel due to the insertion of a glycine in the pore-forming unit (Madox et al. elife 2020; pmid 32940604). In this previous study (Madox et al. elife 2020; pmid 32940604), the authors analyzed Cav1.4 G369i in rod photoreceptor synapses. In the present study, the authors analyzed cone synapses in this KI mouse. 

      For this purpose, the authors performed a comprehensive set of experimental methods

      including immunohistochemistry with antibodies (also with quantitative analyses), electrophysiological measurements of presynaptic Ca2+ currents from cone photoreceptors in the presence/absence of inhibitors of L-type- and T-type- calcium channels, electron microscopy (FIB-SEM), ERG recordings and visual behavior tests of the Cav G369i KI in comparison to the Cav1.4 knockout and wild-type control mice. 

      The authors found that the non-conducting Cav channel is properly localized in cone synapses and demonstrated that there are no gross morphological alterations (e.g., sprouting of postsynaptic components that are typically observed in the Cav1.4 knockout). These findings demonstrate that cone synaptogenesis relies on the presence of Cav1.4 protein but not on its Ca2+ conductivity. This result, obtained at cone synapses in the present study, is similar to the previously reported results observed for rod synapses (Maddox et al., eLife 2020, pmid 32940604). No further mechanistic insights or molecular mechanisms were provided that demonstrated how the presence of the Cav channels could orchestrate the building of the cone synapse. 

      We respectfully disagree regarding the mechanistic advance of our study. As indicated by Reviewer 2, a major advance of our study is in providing a mechanism that can explain the longstanding conundrum that congenital stationary night blindness type 2 mutations that would be expected to severely compromise Cav1.4 function do not produce complete blindness. Our study provides an important contrast to the Maddox et al 2020 study in showing that rods and cones respond differentially to loss of Cav1.4 function, which is also relevant to the visual phenotypes of CSNB2. How the presence of Cav1.4 orchestrates cone synaptogenesis is an important topic that is outside the scope of our present study.

      In the present study, the authors also propose a homeostatic switch from L-type to (newly occurring) T-type calcium channels in the Cav1.4 G369i KI mouse as a consequence of the deficient calcium channel conductivity in the Cav1.4 G369i Cav1.4 KI mouse. In cones of the Cav1.4 G369i, the high-voltage activated, L-type Ca2+-entry was abolished, in agreement with their previous paper (Maddox et al., eLife 2020, pmid 32940604). The authors found a lowvoltage activated Ca2+ current instead that they assigned to T-type Ca2+-currents based on pharmacological inhibitor experiments. T-type Ca2+-currents/channels were already previously identified in other studies by independent groups and independent techniques

      (electrophysiology, RT-PCR, single-cell sequencing) in cones of wild-type mice (Davison et al.,

      2021, pmid 35803735; Macosko et al., 2015, pmid 26000488; Williams et al., 2022, pmid 35650675). In the present manuscript (Figures 3a/b), the authors also observed a low-voltage activated, T-type like current in cones of wild-type mice, that is isradipine-resistant and affected by the T-type inhibitor ML218. This finding appears compatible with a T-type-like current in wildtype cones and is consistent with the published data mentioned above, although the authors interpret this data in a different way in the discussion. 

      Due to the noise inherent in whole cell voltage clamp measurements and some crossover effects in the pharmacology, we cannot completely exclude the presence of a T-type current in WT mouse cones. However, our results very clearly support a conclusion opposite to that stated by the reviewer. Namely, if WT mouse cones have T-type Ca currents, then they are far smaller than those in the Cav1.4 G369i KI and KO cones. In particular, while we identified message for Cav3.2 in WT mouse cones, we were unable to identify a functional T-type current by either voltage clamp measurements or pharmacology. See below for a detailed rebuttal.

      This proposal of a homeostatic switch is not convincingly supported in this reviewer's opinion

      (for further details, please see below). Furthermore, no data on possible molecular mechanisms were provided that would support such a proposal of a homeostatic switch of calcium channels. No mechanistic/molecular insights were provided for a proposed homeostatic switch between Ltype to T-type channels that the authors propose to occur between wild-type and Cav1.4 G369i as a consequence of conduction-deficient Cav1.4 G369i channels. Is this e.g. based on posttranslational modifications that switch on T-type channels or regulation at the transcriptional level inducing expression of T-type calcium channel or on other mechanisms? The authors remain descriptive with their central hypotheses. No molecular mechanisms/signaling pathways were provided that would support the idea of such a homeostatic switch. 

      Homeostatic plasticity refers to the maintenance of neuronal function in response to some perturbation in neuronal activity and can result from changes in the expression of ion channel genes (PMID: 36377048, 32747440, 19778903) or regulatory pathways that modulate ion channels (PMID: 15051886, 32492405). We present multiple lines of evidence showing that Cav3 currents appear in cones upon genetically induced Cav1.4 loss of function and can support cone synaptic responses and visual behavior if cone synapse structure is maintained. Our new transcriptomic studies show no difference between levels of Cav3 channel transcripts in WT and G369i KI cones, suggesting that the appearance of the Cav3 currents in G369i KI cones does not result from an increase in Cav3 gene expression. We are currently investigating our transcriptomic dataset to determine if Cav3 regulatory pathways are upregulated in G369i KI cones and will present this in a follow-up study.

      The authors show residual photopic signaling in the non-conducting Cav1.4 G369i KI mouse as judged by the recording of postsynaptic currents, ERG recordings and visual behavior tests though in a reduced manner. The residual cone-based signaling could be based on the nonaffected T-type Ca2+ channel conductivity in cone synapses. Given that the L-type current through Cav1.4 is gone in the Cav1.4 G369i KI as previously shown (Maddox et al., 2020, pmid 32940604), the T-type calcium current will remain. However as discussed above, this does not necessarily support the idea of a homeostatic switch. 

      A major point which we highlighted with new results is that despite the expression of Cav3 transcripts in WT mouse cones, Cav3 channels do not contribute to the cone Ca2+ current. This is at odds with the Davison et al study (PMID: 35803735, see our response to Reviewer 2, pt 7 for caveats of this study), but our results convincingly show that the Cav3 current appears only when Cav1.4 is genetically inactivated. Pharmacological or electrophysiological methods that should reveal the presence of Cav3 currents do not change the properties of the Ca2+ current in cones of WT mice, ground squirrel, or macaque:

      • Figs.2-4: Voltage steps to -40 mV (Fig 2e) that activate a sizeable T-current in G369i KI mouse cones produce a negligible transient at pulse onset in WT mouse cones. Similarly, transient currents that are obvious in G369i KI mouse cones during the final step to -30 mV are absent in WT cones.  When we block Cav1.4 with isradipine either in cones of WT mice or ground squirrel, the current that remains does not resemble a Cav3 current but rather a scaled down version of the L-type current. ML218, which readily blocks Cav3 channels in HEK293T cells and in G369i KI cones, has only minor effects in cones of WT mice and ground squirrel; these effects of ML218 can be attributed to non-specific actions on Cav1.4 (new Supp.Fig.S2). New Fig.4 (moved from the supplementary data to the main article) clearly shows that the ML218-sensitive current in ground squirrel cones exhibits properties of Cav1.4 not Cav3 channels. 

      • Figs.2,5: Holding voltages that inactivate Cav3 channels have no effect on the Ca2+ current in cones of WT mice or macaque (recordings of macaque cones were moved from the supplement to the main article as new Fig.5).

      In Figure 4 the authors measured an increase in the size of the active zone (as judged by the size of the bassoon cluster) and of the synaptic ribbons in the Cav1.4 G369i. A mechanistic explanation for this phenomenon was not provided and the underlying molecular mechanisms were not unraveled. 

      The FIB-SEM data uncover some ultrastructural alteration/misalignments of the synaptic ribbons and misalignments of the regular arrangement of the postsynaptic dendrites in the G369i KI mice. Also concerning this observation, the study remains descriptive and does not reveal the underlying mechanisms as it would be expected for eLife. 

      We respectfully disagree on the descriptive nature of our study and the need for a full characterization of the molecular mechanism underlying the cone synaptic defects in the G369i KI mouse.   

      An important study in the field (Zanetti et al., Sci. Rep. 2021; pmid 33526839) should be also cited that used a gain-of-function mutation of Cav1.4 to analyze its functional and structural role in the cone pathway. 

      We have added citation of this paper to the Discussion (lines 354-356).

      In conclusion, the study has been expertly performed but remains descriptive without deciphering the underlying molecular mechanisms of the observed phenomena, including the proposed homeostatic switch of synaptic calcium channels. Furthermore, a relevant part of the data in the present paper (presence of T-type calcium channels in cone photoreceptors) has already been identified/presented by previous studies of different groups (Macosko et al., 2015; pmid 26000488; Davison et al., 2021; pmid 35803735; Williams et al., 2022; pmid 35650675). The degree of novelty of the present paper thus appears limited. I think that the study might be better suited in a more specialized journal than eLife. 

      We thank the reviewer for acknowledging the rigor of our study but disagree with their evaluation regarding the novelty of our work as outlined in our responses above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      My comments are largely limited to suggestions to make the manuscript easier to read and digest.

      In the abstract they say RNA sequencing highlights changes in innate...

      Could they be more specific? Innate immune system up or down? They do not indicate actual findings in the abstract.

      We thank the reviewer for the comment and we have revised the abstract accordingly.  

      Their use of non‐intuitive abbreviations is often confusing. Perhaps they can add a table in methods listing all the abbreviations so that the reader can follow the data better. mNGA, vmHT....etc.

      As suggested, we have now included a list of the abbreviations used in the paper.

      There are mis‐spellings in the manuscript.

      We have gone through the manuscript and corrected the mis-spellings.   

      Has the SPR RNAi line been validated?

      The SPR RNAi line that we used has been extensively validated by Yapici et al., 2007 and several subsequent publications. Importantly, the effectiveness of SPR knockdown is evident in female flies as they exhibit dramatically reduced egg laying and, importantly, lack the typical post-mating behaviors (such as rejection of male flies after initial mating) observed in the wild type mated female flies. In fact, female flies with RNAi-mediated SPR knockdown behave identically to females mated with SP-null male flies, confirming the effective disruption of the SP-SPR signaling pathway. We have revised the manuscript and added these statements in the results section concerning SPR RNAi.  

      In the figures showing the Climbing Index vs time, can they abbreviate seconds as sec vs s? At least I think it is seconds. At first, I thought it was Time or Times, and was confused about what they were indicating on those types of graphs (Figures 1D‐F).

      We have revised the figure as suggested by the reviewer.

      In Figure 3F, they have a significance indicated in an unclear manner. It looks like they are comparing neuropil to the cortex, but I think they really mean to compare the cortex of sham to cortex of D31?

      The reviewer was correct. We have revised figure 3F to make this clear.     

      In Figure 4B, what is the y‐axis? Percentage of what? Is that percentage of total flies?

      The reviewer was correct. We have revised the figure to make this clear. 

      In a figure like SF3 B, what is the y‐axis? "Norm. Accum. CI" Can they explain the abbreviation?

      We have revised the Y-axis label to be “Normalized accumulative CI”.  We have also made this clear in the legend.   

      In the methods, what does this mean: "Regions devoid of Hoechst and phalloidin signal in non‐physiologically appropriate areas were considered vacuoles"? What are non‐physiologically appropriate areas? To me, that would mean outside of the brain. I would have thought the areas should be physiologically appropriate (aka neuropil and cortex)? This is confusing.

      We have revised the method section to be more specific.  In the Drosophila brain, there are structures such as esophagus that are devoid of both Hoechst and phalloidin staining, which were excluded from our vacuole quantification.    

      Reviewer #2 (Recommendations For The Authors):

      Since I use mammalian systems, my comment about the confirmation of siRNA should be removed if this is not possible in the Drosophila system.

      We have revised the figures to include total N values when appropriate. Including individual n values for each experimental assay and condition will inevitably crowd the figure legends, so specific values are available upon request. 

      Regarding RNAi knockdown of sex peptide receptors (SPRs), we agree that confirmation of the knockdown by IHC or qRT-PCR will further strengthen our findings. It should be noted, however, that the RNAi line we used has been extensively validated by Yapici et al., 2007 and several subsequent publications. Importantly, the effectiveness of SPR knockdown is evident in female flies as they exhibit dramatically reduced egg laying and, importantly, lack the typical post-mating behaviors (such as rejection of male flies after initial mating) observed in the wild type mated female flies. In fact, female flies with RNAi-mediated SPR knockdown behave identically to females mated with SP-null male flies, confirming the effective disruption of the SP-SPR signaling pathway. We have revised the manuscript to include these statements in the results concerning the SPR RNAi knockdown.    

      Reviewer #3 (Recommendations For The Authors):

      (1) In Figures 1 and 2, the authors found that females have a lower climbing index in the acute phase in D17 injury, not due to neurodegeneration as shown no significant changes of brain vacuolation and other markers. However, in Figure 3, the authors found that female flies have a lower climbing index, more brain vacuolation, and neurodegeneration in the late phase. It's not very convincing that having a lower climbing index at the late phase is due to neurodegeneration. Is it possible that females suffered from more severe acute effects, at least in D17 injury?

      We thank the reviewer for this point. Female flies injured on D17 displayed acute climbing deficits at 90 minutes post-injury. Since we did not observe significant structural changes in the brain at this time, we believe that this short-term functional deficit is not due to acute neuronal death. Here it is important to note that males did not display any acute climbing deficits when injured on D17, which suggests that the females suffered from more severe acute effects than males. However, these injured female flies recovered fully at 24 hours post-injury and displayed no climbing deficits. At two weeks post-injury, we observe climbing deficits and increased vacuole formation as a direct result of the injuries on D17 (see Supplemental Figure 3). When we assessed sensorimotor behavior and brain vacuolation on D45, we found that the injured females had significantly lower climbing indices and more brain vacuolation than the non-injured females of the same age. In this case, the concurrent observance of decreased climbing ability and increased brain vacuolation suggests chronic neurodegeneration in aged, injured females. This is not to be confused with the acute neuronal death observed by other groups using injury models of stronger severity. Overall, our data are consistent with the current view that in many neurodegenerative diseases, functional deficits often precede observable brain degeneration, which may take years to manifest.

      (2) The authors determined late‐life brain deficits and neurodegeneration purely based on climbing index and vacuole formation. These phenotypes are not really specific to TBI‐related neurodegeneration and the significance and mechanisms of vacuole formation are not clear. Indeed, in Figures 3 A and B, male flies especially D31inj tend to have a much larger variation than any other groups. What could be the reasons? The authors should perform additional analyses on TBI‐related neurodegeneration in flies, which have been shown before, such as retinal degeneration and loss, neuronal degeneration, and loss, neuromuscular junction abnormalities, etc (Genetics. 2015 Oct; 201(2): 377‐402).

      We thank the reviewer for the thorough evaluation of our manuscript. The reviewer raised a very important question: whether the neurodegeneration observed in our model is specific to TBI. As the reviewer rightly pointed out, the neurodegenerative phenotypes are unlikely to be specific to TBI-related neurodegeneration. Throughout the manuscript, we have tried to convey the notion that the mild physical impacts to the head represent one form of environmental insults, which in combination with other risk factors such as aging can lead to the emergence of neurodegenerative conditions. It should be noted that the negative geotaxis assay and vacuolation quantification are two well-established approaches to assess sensorimotor deficits and frank brain degeneration in fly brains. 

      It is important to emphasize that the head-specific impacts delivered to the flies in our study are much milder than those used in previous studies. As we showed in our figure 1, this very mild form of head trauma (referred to as vmHT) did not cause any death, nor affected the lifespan of the injured flies. Our supplemental data also show very minimal structural neuronal damage and no acute and chronic apoptosis induced by vmHT exposure. Consistently, we did not observe any exoskeletal or eye damage immediately following injuries, nor did we observe any retinal degeneration and pseudopupil loss at the chronic stage of these flies. We have incorporated these important points in the revised manuscript.  

      (3) In Figure 4, it would be important to perform the behavior test fly speed and directional movement in the acute phase as well to determine whether the females have reduced performance at the acute phase.

      We thank the reviewer for this suggestion. Please note that our modified NGA has already improved the spatiotemporal resolution over the classic NGA.  The data presented in Fig.3 show that there are no acute deficits for young cohorts.  Therefore, we do not believe that the detailed analysis of the direction and speed of these flies is essential.  

      Unfortunately, the current setup for the AI-based analysis requires manual corrections of tracking errors, which are time-consuming and tedious.  We are building a newly designed AI-based NGA (NGA.ai) that will allow automatic tracking and quantification with minimal manual interventions. Once it is completed, we will perform some of the analyses that the reviewer suggested.  

      (4) In Figure 8, the authors performed an RNA‐seq analysis and identified some dysregulated gene expressions. However, it is really surprising to see so few DEGs even in wild‐type males and mated females, and to see that none of DEGs overlap among groups or related to the SP‐signaling. This raises questions about the validity of the RNAseq analysis. It is critical to independently verify their RNA‐sequencing results and to add some more molecular evidence to support their conclusion.

      We agree that future studies are needed to independently validate our RNA sequencing results. We believe that the small number of DEGs are likely due to two unique features of our study: (1) the very mild nature of our injury paradigm and (2) the chronic examination timepoint that was long after the head injury and SP exposure, which distinguish our study from previous fly TBI studies.  As pointed out in the manuscript, our study was aimed to understand how early life exposure to repetitive head traumatic insults could lead to the latelife onset of neurodegenerative conditions. We hope to further validate our results in our next phase of experiments using single-cell RNA sequencing and RT-qPCR. 

      (5) The current results raise a series of interesting questions: what implication of female fly mating and its associated Sex Peptide signaling would be to mammalians or humans? Would mammalian female animals mating with wild‐type or sex hormone‐null male animals have different effects on their post‐injury behavior tests or neuropathological changes? What are the mechanisms underlying the sexual dimorphism?

      As the reviewer pointed out, it would be very interesting to explore the possible roles of sex peptide-signaling in other animals and humans. As far as we know, there is no known mammalian ortholog to the insect sex peptide, so it would be difficult to study SP or an SPlike molecule in mammalian models. However, we believe that prolonged post-mating changes associated with reproduction in female fruit flies contribute to their elevated vulnerability to neurodegeneration.  In this regard, drastic changes within the biology of female mammals associated with reproduction can potentially lead to vulnerability to neurodegeneration. We agree that this demands further study, which may be done with future collaborators using rodent or large animal models.  We have discussed this point in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank you very much for reviewing our manuscript and express our sincere appreciation for the valuable and thoughtful comments that led us to significantly improve the manuscript on Fshr-ZsGreen reporter mice. We have seriously taken your comments to make a major revision of the manuscript, and here is a summary of the revision:

      (1) New data on Fshr expression are input to the revised Manuscript:

      a. Fshr expression in the testis and adipose tissues (WAT and BAT) of B6 mice;

      b. Fshr expression in the testis of B6 by RNA-smFISH;

      c. Comparison of Fshr expression in the testis and ovary between Fshr-ZsGreen and B8 mice by ddRT-PCR to prove Fshr expression without interruptions by insertion of P2A-ZsGreen vector;

      d. Reduction of Fshr expression in osteocytes within the femoral sections from DMP1-CreERT2:Fshrfl/fl mice;

      e. Fshr expression in an established Leydig cell line-TM3 by immunofluorescence and ddRT-PCR, also show Fshr located in the nuclei of TM3 cells;

      f. Fshr expression at scRNA-seq level from 5 public single cell portals as Supplementary Data 3 to support our findings of the widespread expression pattern of Fshr, particularly in Leydig cells.

      (2) Re-organization of Figure 2 with a new legend.

      (3) A new paragraph is added to the Discussion Section of the revised MS to explain the function of P2A peptide in generation of GFP reporter mice and why Fshr express is not interrupted by the P2A-ZsGreen insertion in Fshr-ZsGreen reporter.

      (4) Deletion of Figure 1-D-c, as it is not necessary.

      (5) Replace of Figure 8-A (the left panel) with a reduced exposure time image.

      (6) Amended parts of the revised MS are labeled in red.

      A point by point response to the Reviewers’ comments:

      Reviewer 1:

      One of the shocking observations in this manuscript is the expression of FSHR in Leydig cells. Other observations are in the osteoblasts and endothelial cells as well as epithelial cells in different organs. The expression of ZsGreen in these tissues seems high and one shall start questioning if there are other mechanisms at play here.

      First, the turnover of fluorescent proteins is long, longer than 48h, which means that they accumulate at a different speed than the endogenous FSHR This means that ZsGreen will accumulate in time while the FSHR receptor might be degraded almost immediately. This correlated with mRNA expression (by the authors) but does not with the results of other studies in single-cell sequencing (see below).

      The expression of ZsGreen in Leydig cells seems much higher than in Sertoli cells, this is "disturbing" to put it mildly. This is visible in both the ZsGreen expression and the FISH assay (Figure 2 B-D).

      Thank you for this valuable comments. We added new data on Fshr expression to prove the presence of Fshr in Leydig cells in B6 detected by immunofluorescence staining, RNA-smFISH and ddRT-PCR, as well as in TM3 cells-isolated Leydig cells from a male mice in the revise MS (Fig 2E, F and G), that demonstrate no interruptions of normal Fshr expression by insertion of P2A-ZsGreen vector into a locus located between exon10 and stop code. We use ZsGreen as an indicator for active Fshr promoter status, rather than a method to measure Fshr expression, which is done by ddRT-PCR. These data are shown in Figure 2G of the revised MS

      In addition, we provide scRNA-seq based evidence on Fshr expression in human Leydig cells from two single cell portals (DISCO and BioGPS) as shown in Supplementary Data 3 in the revised MS. We also cited a recent report on scRNA-seq analysis of Fshr expression in Hu sheep in the revised MS as Reference 65 (PMID: 37541020) 1, which also clearly showed Fshr expression in Leydig cells at single cell level in Hu Sheep.

      We believe that the lack of Fshr expression in some single cell databases may be due to the degradation of Fshr transcript in cells during the process of single cell populations. In our laboratory, we spent more than 6 months to optimize methods and reagents to perverse mRNA integrity more than 8 for RAN-seq.

      The expression in WAT and BAT is also questionable as the expression of ZsGreen is high everywhere. That makes it difficult to believe that the images are truly informative. For example, the stainings of aorta show the ZsGreen expression where elastin and collagen fibres are - these are not "cells" and therefore are not expressing ZsGreen.

      FISH expression (for FSHR) in WT mice is missing.

      Also, the tissue sections were stained with the IgG only (neg control) but in practice both the KI and the WT tissues should be stained with the primary and secondary antibodies. The only control that I could think of to truly get a sense of this would be a tagged receptor (N-terminal) that could then be analysed by immunohistochemistry.

      Reply 2 and 3: Thank you for these comments. New data on Fshr expression in WAT and BAT of B6 mice by immunofluorescence staining and in the testis of B6 mice by immunofluorescence staining and RNA-smFISH are added to the revised MS (Fig.2D and E, and Fig. 4G), showing similar patterns to that of Fshr-ZsGreen mice. Furthermore, we provide more evidences as Supplementary Data 3 on Fshr expression obtained from 4 public single cell portables, showing FSHR expression in a widespread organs and tissues (including different fractions of adipose cells) of human, mice and rat at single cell levels. Please also check Fshr expression pattern in adipose tissues by immunostaining for Fshr in previous reports (Fig. 3a of PMID: 28538730 and Fig. 2 of PMID: 25754247) 2 3, which showed a similar expression pattern to our finding. These data should address your concerns on Fshr expression in WAT and BAT and other organs/tissues.

      Regard of “For example, the stainings of aorta show the ZsGreen expression where elastin and collagen fibres are - these are not "cells" and therefore are not expressing ZsGreen.” We believe that you referred to the image of the aorta in Supplementary Data2. However, Please take a look at the images of the aorta in Figure 5-C, which shows positively stained the layer of ‘elastin and collagen fibres’ for EMCN and a-SMA colocalized with Fshr expression with stained DAPI at a 1000X magnification, indicating endothelial cells and the cellular membrane presented in this layer, not just ‘elastin and collagen’.

      The authors also claim:

      To functionally prove the presence of FSHR in osteoblasts/osteocytes, we also deleted FSHR in osteocytes using an inducible model. The conditional knockout of FSHR triggered a much more profound increase in bone mass and decrease in fat mass than blockade by FSHR antibodies (unpublished data).

      This would be a good control for all their images. I think it is necessary to make the large claim of extragonadal expression, as well as intragonadal such as Leydig cells.

      Thank you for this very encouraging comment. As you suggested, we did add a result of reduced Fshr expression in osteocytes from DMP1-CreERT2+:Fshrfl/fl mice treated with tamoxifen to the revise MS, as shown in Figure 3D, demonstrating Fshr present in osteocytes and the specificity of Fshr antibody. Furthermore, we incorporated your advice on making ‘ large claim of extrogonadal and intragonadal expression of Fshr’ into the revised MS in red.

      Claiming that the under-developed Leydig cells in FSHR KO animals are due to a direct effect of the FSHR, and not via a cross-talk between Sertoli and Leydig cells, is too much of a claim. It might be speculated to some degree but as written at the moment it suggests this is "proven".

      Thank you for pointing out this incorrect claim and we apologized for it. In the revised MS, we deleted this claim.

      We also do not know if this FSHR expressed is a spliced form that would also result in the expression of ZsGreen but in a non-functional FSHR, or whether the FSHR is immediately degraded after expression. The insertion of the ZsGreen might have disturbed the epigenetics, transcription, or biosynthesis of the mRNA regulation.

      Thanks for this comment. In the revised MS, we added a new section to explain the function of P2A peptide in generation of a GFP reporter by sgRNA-guilded site specific knockin of P2A ZsGreen vector through CRISPRA/cas9 and provided a new result on comparison of Fshr expression in the testes and ovaries from Fshr-ZsGreen and B6 mice, showing equivalent Fshr expression between Fshr-ZsGreen and B6 mice (Figure 2G), which indicates no interruptions of Fshr expression by the insertion of P2A vector.

      The authors should go through single-cell data of WT mice to show the existence of the FSHR transcript(s).<br /> For example here:<br /> https://www.nature.com/articles/sdata2018192

      Thank you so much for the valuable comment. Yes, we took you critical advice to check Fshr expression through 4 single cell portals, including DISCO, GTEx, BioGPS and Human single cell portal, and present the collected data as Supplementary Data 3 in the revised MS, that strongly support our findings of the wider Fshr expression. Particularly, Fshr expression in Leydig cells is proved by scRNA-seq studies of human cells from DISCO and BioGPS, as well as a recent study in Hu sheep (PMID: 37541020) 1 and we cited it in the revised MS.

      Reviewer 2:

      Is the FSHR expression pattern affected by the knockin mice (no side-by-side comparison between wt and GSGreen mice, using in situ hybridization and ddRTPCR, at least in the gonads, is provided)?

      Thanks for the comment. In the revised MS, we provided a set of new data on Fshr expression in the testis, ovary, WAT and BAT of B6 mice by immunofluorescence staining and by RNA-smFISH for Fshr expression, showing similar expression patterns. Additionally, we also performed ddRT-PCT to compare Fshr expression in the testes and ovaries between Fshr-ZsGreen and B6 mice, demonstrating equivalent expression of Fshr expression between Fshr-ZsGreen and B6 mice. Interestingly, we also observed an significantly higher Fshr expression in the testis than that in the ovary (more than 30 folds).

      Is the splicing pattern of the FSHR affected in the knockin compared to wt mice, at least in the gonads?

      Thanks for the question. Please see our reply to the Reviewer 1 for the function of P2A peptide used for generation of GFP reporters.  Although we didn’t directly assess the splicing pattern, we provide a result of comparison of Fshr expression in Figure 2F in the revised MS, indirectly showing no changes of the splicing pattern. We will assess the splicing pattern of Fshr in the future that has been neglected in the field.

      Are there any additional off-target insertions of GSGreen in these mice?” and “Are similar results observed in separate founder mice?

      Thanks for the questions. As we describe it in the method section  in detail in the MS, Fshr-ZsGreen reporter was produced by the a site-specific long ssDNA recombination of the P2A-ZsGreen targeting vector to the locus between Exon10 and stop code by CRIPRA/cas9, which was guided by site-specific single guide RNA (sgRNA). We showed the results of Southern blot, DNA sequencing and site-specific PCR, proving the site-specific insertion of P2A-ZsGreen as shown in Figure 1. Because of the site-specific recombination, professionally, only one funder line is required for the study and there are no additional off-target insertions.

      How long is GSGreen half-life? Could a very long half-life be a major reason for the extremely large expression pattern observed?

      Thanks for the question. The half life of ZsGreen, also called ZsGreen1, is at least 26 h in mammalian cells or slightly longer due to its tetrameric structure, in contrast with the monomeric configuration of other well-known fluorescent proteins (PMID: 17510373) 4. The rationale for using this GFP protein is that ZsGreen is an exceptionally bright green fluorescent protein, which is up to 4X brighter than EGFP—and is ideally suited for whole-cell labelling, promoter-reporter studies, considering of the higher turnover and rapid degradation of Fshr transcript. In this study, we used ZsGreen as a monitor or an indicator of the active Fshr endogenous promoter, rather than a means for measuring the promoter activity. Therefore, regardless of its accumulation or not, ZsGreen driven by Fshr promoter, indicates the presence of active Fshr promoter in the defined cells. In stead, we used ddRT-PCR to measure Fshr expression degrees in this study. In addition, we also provide single cell sequence-based evidence from 4 public single cell portables to support our findings of the wide Fshr expression. Please see Supplementary Data 3 in the revised MS.

      References:

      (1) Su J, Song Y, Yang Y, et al. Study on the changes of LHR, FSHR and AR with the development of testis cells in Hu sheep. Anim Reprod Sci. Sep 2023;256:107306. doi:10.1016/j.anireprosci.2023.107306

      (2) Liu P, Ji Y, Yuen T, et al. Blocking FSH induces thermogenic adipose tissue and reduces body fat. Nature. Jun 1 2017;546(7656):107-112. doi:10.1038/nature22342

      (3) Liu XM, Chan HC, Ding GL, et al. FSH regulates fat accumulation and redistribution in aging through the Galphai/Ca(2+)/CREB pathway. Aging Cell. Jun 2015;14(3):409-20. doi:10.1111/acel.12331

      (4) Bell P, Vandenberghe LH, Wu D, Johnston J, Limberis M, Wilson JM. A comparative analysis of novel fluorescent proteins as reporters for gene transfer studies. J Histochem Cytochem. Sep 2007;55(9):931-9. doi:10.1369/jhc.7A7180.2007

    1. Author response:

      eLife assessment 

      This important study identifies a novel gastrointestinal enhancer of Ctnnb1. The authors present convincing evidence to support their claim that the dosage of Wnt/β-catenin signaling controlled by this enhancer is critical to intestinal epithelia homeostasis and the progression of colorectal cancers. The study will be of interest to biomedical researchers interested in Wnt signaling, tissue-specific enhancers, intestinal homeostasis, and colon cancer. 

      We greatly appreciate editors’ and reviewers’ extensive and constructive comments and suggestions. We will do our utmost to revise the manuscript accordingly.

      Public Reviews: 

      Reviewer #1 (Public Review)

      Summary: 

      Ctnnb1 encodes β-catenin, an essential component of the canonical Wnt signaling pathway. In this study, the authors identify an upstream enhancer of Ctnnb1 responsible for the specific expression level of β-catenin in the gastrointestinal tract. Deletion of this promoter in mice and analyses of its association with human colorectal tumors support that it controls the dosage of Wnt signaling critical to the homeostasis in intestinal epithelia and colorectal cancers. 

      Strengths: 

      This study has provided convincing evidence to demonstrate the functions of a gastrointestinal enhancer of Ctnnb1 using combined approaches of bioinformatics, genomics, in vitro cell culture models, mouse genetics, and human genetics. The results support the idea that the dosage of Wnt/β-catenin signaling plays an important role in the pathophysiological functions of intestinal epithelia. The experimental designs are solid and the data presented are of high quality. This study significantly contributes to the research fields of Wnt signaling, tissue-specific enhancers, and intestinal homeostasis. 

      Weaknesses: 

      One weakness of this manuscript is an insufficient discussion on the Ctnnb1 enhancers for different tissues. For example, do specific DNA motifs and transcriptional factors contribute to the tissue-specificity of the neocortical and gastrointestinal enhancers? It is also worth discussing the potential molecular mechanisms controlling the gastrointestinal expression of Ctnnb1 in different species since the identified human and mouse enhancers don't seem to share significant similarities in primary sequences. 

      We agree with the reviewer that the manuscript lacks sufficient discussions on how enhancers control cell-type-specific expressions of target genes, which is one of the most important questions in the field of transcription regulation. Equally important are the common and species-specific features of this regulation. In general, motif composition, location, order, and affinity with trans-factors within enhancers are four key elements. We will elaborate the point in follow-up revision.

      Reviewer #2 (Public Review): 

      Wnt signaling is the name given to a cell-communication mechanism that cells employ to inform on each other's position and identity during development. In cells that receive the Wnt signal from the extracellular environment, intracellular changes are triggered that cause the stabilization and nuclear translocation of β-catenin, a protein that can turn on groups of genes referred to as Wnt targets. Typically these are genes involved in cell proliferation. Genetic mutations that affect Wnt signaling components can therefore affect tissue expansion. Loss of function of APC is a drastic example: APC is part of the β-catenin destruction complex, and in its absence, β-catenin protein is not degraded and constitutively turns on proliferation genes, causing cancers in the colon and rectum. And here lies the importance of the finding: β-catenin has for long been considered to be regulated almost exclusively by tuning its protein turnover. In this article, a new aspect is revealed: Ctnnb1, the gene encoding for β-catenin, possesses tissue-specific regulation with transcriptional enhancers in its vicinity that drive its upregulation in intestinal stem cells. The observation that there is more active β-catenin in colorectal tumors not only because the broken APC cannot degrade it, but also because transcription of the Ctnnb1 gene occurs at higher rates, is novel and potentially game-changing. As genomic regulatory regions can be targeted, one could now envision that mutational approaches aimed at dampening Ctnnb1 transcription could be a viable additional strategy to treat Wnt-driven tumors. 

      We appreciate the reviewer for acknowledging the potential significance represented by the manuscript. We also recognize that targeting genomic regulatory regions to dampen Ctnnb1 transcription could be a promising strategy for treating Wnt-driven tumors, including many colorectal carcinomas. However, we would like to point out that three are significant technical challenges associated with AAV delivery to the GI epithelium, including the hostile environment, immune response, and low delivery efficiency.

      Reviewer #3 (Public Review): 

      The authors of this paper identify an enhancer upstream of the Ctnnb1 gene that selectively enhances expression in intestinal cells. This enhancer sequence drives expression of a reporter gene in the intestine and knockout of this enhancer attenuates Ctnnb1 expression in the intestine while protecting mice from intestinal cancers. The human counterpart of this enhancer sequence is functional and involved in tumorigenesis. Overall, this is an excellent example of how to fully characterize a cell-specific enhancer. The strength of the study is the thorough nature of the analysis and the relevance of the data to the development of intestinal tumors in both mice and humans. A minor weakness is that the loss of this enhancer does not completely compromise the expression of the Ctnnb1 gene in the intestine, suggesting that other elements are likely involved. Adding some discussion on that point would be helpful.

      We are quite encouraged by the reviewer’s positive comments. We agree with the reviewer that other cis-regulatory elements may be involved in the transcription of Ctnnb1 within the GI epithelium. It is also possible that the basal transcription of Ctnnb1 within the GI epithelium is relatively high, and that enhancers can only boost transcription within a certain range. We will discuss these possibilities in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript presents a machine-learning method to predict protein hotspot residues. The validation is incomplete, along with the misinterpretation of the results with other current methods like FTMap.

      We believe that validation is complete: The two most common techniques for testing and validating machine-learning methods are to split the dataset into either (1) a training set and a test set with a fixed ratio (e.g., 70% for training and 30% for testing) or (2) multiple subsets/folds; i.e., cross-validation. We did not employ a training set to train the model and a separate test set to evaluate its performance, as Reviewer 2 assumed. Instead, we employed cross-validation, as it helps reduce the variability in performance estimates compared to a single training/test split, and utilizes the entire dataset for training and testing, making efficient use of the limited data. Each fold was used once as a test set and the remaining folds as the training set - this process was repeated for each fold and the model's performance was measured using the F1 score. We had listed the mean validation F1 score in Table 1.

      We have clarified our comparison with FTMAP  - see reply to point 1 of reviewer 1 below. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper describes a program developed to identify PPI-hot spots using the free protein structure and compares it to FTMap and SPOTONE, two webservers that they consider as competitive approaches to the problem. On the positive side, I appreciate the effort in providing a new webserver that can be tested by the community but have two major concerns as follows.

      (1) The comparison to the FTMap program is wrong. The authors misinterpret the article they refer to, i.e., Zerbe et al. "Relationship between hot spot residues and ligand binding hot spots in protein-protein interfaces" J. Chem. Inf. Model. 52, 2236-2244, (2012). FTMap identifies hot spots that bind small molecular ligands. The Zerbe et al. article shows that such hot spots tend to interact with hot spot residues on the partner protein in a protein-protein complex (emphasis on "partner"). Thus, the hot spots identified by FTMap are not the hot spots defined by the authors. In fact, because the Zerbe paper considers the partner protein in a complex, the results cannot be compared to the results of Chen et al. This difference is missed by the authors, and hence the comparison of the FTMap is invalid. I did not investigate the comparison to SPOTONE, and hence have no opinion.

      Brenke et al. (Bioinformatics 2009 25: 621-627), who developed FTMAP, defined hot spots as regions of the binding surface that “contribute a disproportionate amount to the binding free energy”. Kozakov et al. (Proc. Natl. Acad. Sci. 2011:108, 13528-1353) used unbound protein structures as input to FTMap to predict binding hot spots for protein-protein interactions (PPIs), which are defined as regions (so-called consensus sites) on a protein surface that bind multiple probe clusters − the main hot spot is the largest consensus site binding the largest number of probe clusters. 

      Zerbe et al. (J. Chem. Inf. Model. 2012:52, 2236) noted that a consensus “site is expected to be important in any interaction that involves that region of the target independent of any partner protein.” They showed that for hot spot residues found by Ala scanning not only overlapped with the probe ligands but also form consensus sites, as shown in Figure 4. They stated that “A residue can also be identified as a hot spot by alanine scanning if it contributes to creating such a favorable binding environment by being among the residues forming a consensus site on the protein to which it belongs.”

      To clarify the comparison with FTmap in the revised version, we have added the following sentence in the Abstract on p. 3:

      “We explored the possibility of detecting PPI-hot spots using (i) FTMap in the PPI mode, which identifies hot spots on protein-protein interfaces from the free protein structure, and (ii) the interface residues predicted by AlphaFold-Multimer.”

      We have added the following sentences in the Introduction section on p. 4:

      “We explored the possibility of detecting PPI-hot spots using the FTMap server in the PPI mode, which identifies hot spots on protein-protein interfaces from free protein structures.45 These hot spots are identified by consensus sites − regions that bind multiple probe clusters.42,45,59 Such regions are deemed to be important for any interaction involving that region of the target, independent of partner protein.42 PPIhot spots were identified as residues in van der Waals (vdW) contact with probe ligands within the largest consensus site containing the most probe clusters.”

      and in the Results section on p. 5:

      “Given the free protein structure, PPI-HotspotID and SPOTONE53 predict PPI-hot spots based on a probability threshold (> 0.5). FTMap, in the PPI mode, detects PPIhot spots as consensus sites/regions on the protein surface that bind multiple probe clusters.59 Residues in vdW contact with probe molecules within the largest consensus site were compared with PPI-hotspotID/SPOTONE predictions.”

      (2) Chen et al. use a number of usual features in a variety of simple machine-learning methods to identify hot spot residues. This approach has been used in the literature for more than a decade. Although the authors say that they were able to find only FTMap and SPOTONE as servers, there are dozens of papers that describe such a methodology. Some examples are given here: (Higa and Tozzi, 2009; Keskin, et al., 2005; Lise, et al., 2011; Tuncbag, et al., 2009; Xia, et al., 2010). There are certainly more papers. Thus, while I consider the web server as a potentially useful contribution, the paper does not provide a fundamentally novel approach.

      Our paper introduces several novel elements in our approach: 

      (1) Most PPI-hot spot prediction methods employ PPI-hotspots where mutations decrease protein binding free energy by > 2 kcal/mol (J. Chem. Inf. Model. 2022, 62, 1052). In contrast, our method incorporates not only PPI-hot spots with such binding free energy changes, but also those whose mutations have been curated in UniProtKB to significantly impair/disrupt PPIs. Because our method employs the largest collection of experimentally determined PPI-hot spots, it could uncover elusive PPI-hot spots not within binding interfaces, as well as potential PPI-hot spots for other protein partners (see point 3 below). 

      (2) Whereas most machine-learning methods for PPI-hot spot prediction focus on features derived from (i) primary sequences or (ii) protein-protein complexes, we introduce novel features such as per-residue free energy contributions derived from unbound protein structures. We further revealed the importance of one of our novel features, namely, the gas-phase energy of the target protein relative to its unfolded state and provided the physical basis for its importance. For example, PPI-hot spots can enhance favorable enthalpic contributions to the binding free energy through hydrogen bonds or van der Waals contacts across the protein’s interface. This makes them energetically unstable in the absence of the protein’s binding partner and solvent; hence providing a rationale for the importance of the gas-phase energy of the target protein relative to its unfolded state.

      (3) As a result of these novel elements, our approach, PPI-HotspotID,  could identify many true positives that were not detected by FTMap or SPOTONE (see Results and Figure 1). Previous methods generally predict residues that make multiple contacts across the proteinprotein interface as PPI-hot spots. In contrast, PPI-HotspotID can detect not only PPI-hot spots that make multiple contacts across the protein-protein interface, but also those lacking direct contact with the partner protein (see Discussion).

      (4) Unlike most machine-learning methods which require feature customization, data preprocessing, and model optimization, our use of AutoGluon’s AutoTabular module automates data preprocessing, model selection, hyperparameter optimization, and model evaluation. This automation reduces the need for manual intervention.

      We have revised and added the following sentences on p. 9 in the Discussion section to highlight the novelty of our approach: 

      “Here, we have introduced two novel elements that have helped to identify PPI-hot spots using the unbound structure. First, we have constructed a dataset comprising 414 experimentally known PPI-hot spots and 504 nonhot spots, and carefully checked that PPI-hot spots have no mutations resulting in ΔΔGbind < 0.5 kcal/mol, whereas nonhot spots have no mutations resulting in ΔΔGbind ≥ 0.5 kcal/mol or impact binding in immunoprecipitation or GST pull-down assays (see Methods). In contrast, SPOTONE53 employed nonhot spots defined as residues that upon alanine mutation resulted in ΔΔGbind < 2.0 kcal/mol. Notably, previous PPI-hot spot prediction methods did not employ PPIhot spots whose mutations have been curated to significantly impair/disrupt PPIs in UniProtKB (see Introduction). Second, we have introduced novel features derived from unbound protein structures such as the gas-phase energy of the target protein relative to its unfolded state.”

      Strengths:

      A new web server was developed for detecting protein-protein interaction hot spots.

      Weaknesses:

      The comparison to FTMap results is wrong. The method is not novel.

      See reply to points 1 and 2 above.

      Reviewer #2 (Public Review):

      Summary:

      The paper presents PPI-hotspot a method to predict PPI-hotspots. Overall, it could be useful but serious concerns about the validation and benchmarking of the methodology make it difficult to predict its reliability.

      Strengths:

      Develops an extended benchmark of hot-spots.

      Weaknesses:

      (1) Novelty seems to be just in the extended training set. Features and approaches have been used before.

      The novelty of our approach extends beyond just the expanded training set, as summarized in our reply to Reviewer #1, point 2 above. To our knowledge, previous studies did not leverage the gas-phase energy of the target protein relative to its unfolded state for detecting PPI-hot spots from unbound structures. Previous studies did not automate the training and validation process. In contrast, we used AutoGluon’s AutoTabular module to automate the training  of (i) individual “base” models, including LightGBM, CatBoost, XGBoost, random forests, extremely randomized trees, neural networks, and K-nearest neighbours, then (ii) multiple “stacker” models. The predictions of multiple “stacker” models were fed as inputs to additional higher layer stacker models in an iterative process called multi-layer stacking. The output layer used ensemble selection to aggregate the predictions of the stacker models. To improve stacking performance, AutoGluon used all the data for both training and validation through repeated k-fold bagging of all models at all layers of the stack, where k is determined by best precision. This comprehensive approach, including repeated k-fold bagging of all models at all layers of the stack, sets our methodology apart from previous studies, including SPOTONE (see Methods). 

      (2) As far as I can tell the training and testing sets are the same. If I am correct, it is a fatal flaw.

      The two most common techniques for testing and validating machine-learning methods are to split the dataset into either (1) a training set and a test set with a fixed ratio (e.g., 70% for training and 30% for testing) or (2) multiple subsets/folds; i.e., cross-validation. We did not employ a training set to train the model and a separate test set to evaluate its performance. Instead, we employed cross-validation, where the model was trained and evaluated multiple times. Each fold was used once as a test set and the remaining folds serve as the training set - this process was repeated for each fold. For each test set, we assessed  the model's performance using the F1 score. We had listed the mean validation F1 score in Table 1 in the original manuscript. Cross-validation helps reduce the variability in performance estimates compared to a single training/test split. It also utilizes the entire dataset for training and testing, making efficient use of the limited data. We have clarified this on p. 14 in the revised version:

      “AutoGluon was chosen for model training and validation due to its robustness and userfriendly interface, allowing for the simultaneous and automated exploration of various machine-learning approaches and their combinations. Instead of using a single training set to train the model and a separate test set to evaluate its performance, we employed cross-validation, as it utilizes the entire dataset for both training and testing, making efficient use of the limited data on PPI-hot spots and PPI-nonhot spots. AutoGluonTabular automatically chose a random partitioning of our dataset into multiple subsets/folds for training and validation. Notably, the training and validation data share insignificant homology, as the average pairwise sequence identity in our dataset is 26%. Each fold was used once as a test set, while the remaining folds served as the training set. For each test set, the model's performance was measured using the F1 score.”

      (3) Comparisons should state that: SPOTONE is a sequence (only) based ML method that uses similar features but is trained on a smaller dataset. FTmap I think predicts binding sites, I don't understand how it can be compared with hot spots. Suggesting superiority by comparing with these methods is an overreach.

      In the Introduction on page 3, we had already stated that:

      “SPOTONE53 predicts PPI-hot spots from the protein sequence using residue-specific features such as atom type, amino acid (aa) properties, secondary structure propensity, and mass-associated values to train an ensemble of extremely randomized trees. The PPIhot spot prediction methods have mostly been trained, validated, and tested on data from the Alanine Scanning Energetics database (ASEdb)55 and/or the Structural Kinetic and Energetic database of Mutant Protein Interactions (SKEMPI) 2.0 database.56”

      On p. 4, we have clarified how we used FTMAP to detect hot spots - see reply to Reviewer #1, point 1. 

      “We explored the possibility of detecting PPI-hot spots using the FTMap server in the PPI mode, which identifies hot spots on protein-protein interfaces from free protein structures.45 These hot spots are identified by consensus sites − regions that bind multiple probe clusters.42,45,59 Such regions are deemed to be important for any interaction involving that region of the target, independent of partner protein.42 PPI-hot spots were identified as residues in van der Waals (vdW) contact with probe ligands within the largest consensus site containing the most probe clusters.”

      (4) Training in the same dataset as SPOTONE, and then comparing results in targets without structure could be valuable.

      We think that the dataset used by SPOTONE is not as “clean” as ours since SPOTONE employed nonhot spots defined as aa residues that upon alanine mutation resulted in ΔΔGbind < 2.0 kcal/mol.  In contrast, we define nonhot spots as residues whose mutations resulted in protein  ΔΔGbind changes < 0.5 kcal/mol. Moreover, we carefully checked that the nonhot spots have no mutations resulting in ΔΔGbind changes ≥ 0.5 kcal/mol or impact binding in immunoprecipitation or GST pull-down assays (see Methods). We cannot compare results in targets without structure because we require the free protein structure to compute the perresidue free energy contributions. 

      (5) The paper presents as validation of the prediction and experimental validation of hotspots in human eEF2. Several predictions were made but only one was confirmed, what was the overall success rate of this exercise?

      We did not test all predicted PPI-hot spots but only the PPI-hot spot with the highest probability of 0.67 (F794) and 7 other predicted PPI-hot spots that were > 12 Å from F794 as well as 4 predicted PPI-nonhot spots. Among the 13 predictions tested, F794 and the 4 predicted nonhot spots were confirmed to be correct. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Remove the comparison to FTMap, and find a more appropriate reference method, even if it requires installing programs rather than using the available web servers.

      We have clarified comparison to FTMap in the revised ms - see our reply above.

    1. Author response:

      eLife assessment

      This useful study examines the neural activity in the motor cortex as a monkey reaches to intercept moving targets, focusing on how tuned single neurons contribute to an interesting overall population geometry. The presented results and analyses are solid, though the investigation of this novel task could be strengthened by clarifying the assumptions behind the single neuron analyses, and further analyses of the neural population activity and its relation to different features of behaviour.

      Thanks for recognizing the content of our research, and please stay tuned for our follow-up studies on neural dynamics during interception.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addresses the question of how task-relevant sensory information affects activity in the motor cortex. The authors use various approaches to address this question, looking at single units and population activity. They find that there are three subtypes of modulation by sensory information at the single unit level. Population analyses reveal that sensory information affects the neural activity orthogonally to motor output. The authors then compare both single unit and population activity to computational models to investigate how encoding of sensory information at the single unit level is coordinated in a network. They find that an RNN that displays similar orbital dynamics and sensory modulation to the motor cortex also contains nodes that are modulated similarly to the three subtypes identified by the single unit analysis.

      Strengths:

      The strengths of this study lie in the population analyses and the approach of comparing single-unit encoding to population dynamics. In particular, the analysis in Figure 3 is very elegant and informative about the effect of sensory information on motor cortical activity. The task is also well designed to suit the questions being asked and well controlled.

      We appreciate these kind comments.

      It is commendable that the authors compare single units to population modulation. The addition of the RNN model and perturbations strengthen the conclusion that the subtypes of individual units all contribute to the population dynamics. However, the subtypes (PD shift, gain, and addition) are not sufficiently justified. The authors also do not address that single units exhibit mixed modulation, but RNN units are not treated as such.

      We’re sorry for not providing sufficient grounds to introduce the subtypes. We determined the PD shift, gain, and addition as pertinent subtypes based on classical cosine tuning model (Georgopoulos et al., 1982) and referred to some gain modulation studies (e.g. Pesaran et al. 2010, Bremner and Andersen, 2012). Here, we applied this subtype analysis as a criteria to identify the modulation in neuronal population rather than to sort neuron into distinct cell types. We will update Methods in the revised version of manuscript.

      Weaknesses:

      The main weaknesses of the study lie in the categorization of the single units into PD shift, gain, and addition types. The single units exhibit clear mixed selectivity, as the authors highlight. Therefore, the subsequent analyses looking only at the individual classes in the RNN are a little limited. Another weakness of the paper is that the choice of windows for analyses is not properly justified and the dependence of the results on the time windows chosen for single-unit analyses is not assessed. This is particularly pertinent because tuning curves are known to rotate during movements (Sergio et al. 2005 Journal of Neurophysiology).

      The mixed selectivity or precisely the mixed modulation is indeed a significant feature of neuronal population in the present study. The purpose of the subtype analysis was to serve as a criterion for the potential modulation mechanisms. However, the results appear to be a spectrum than clusters. It still through some insights to understand the modulation distribution and we will refine the description in the next version. In the current version, we observed single-unit tuning and population neural state with sliding windows, focusing on the period around movement onset (MO) due to the emergence of a ring-like structure. We will clarify the choice of windows and the dependence assessment in the next version. It’s a great suggestion to consider the role of rotating tuning curves in neural dynamics during interception.

      This paper shows sensory information can affect motor cortical activity whilst not affecting motor output. However, it is not the first to do so and fails to cite other papers that have investigated sensory modulation of the motor cortex (Stavinksy et al. 2017 Neuron, Pruszynski et al. 2011 Nature, Omrani et al. 2016 eLife). These studies should be mentioned in the Introduction to capture better the context around the present study. It would also be beneficial to add a discussion of how the results compare to the findings from these other works.

      Thanks for the reminder. We will introduce the relevant research in the next version of manuscript.

      This study also uses insights from single-unit analysis to inform mechanistic models of these population dynamics, which is a powerful approach, but is dependent on the validity of the single-cell analysis, which I have expanded on below.

      I have clarified some of the areas that would benefit from further analysis below:

      (1) Task:

      The task is well designed, although it would have benefited from perhaps one more target speed (for each direction). One monkey appears to have experienced one more target speed than the others (seen in Figure 3C). It would have been nice to have this data for all monkeys.

      Great suggestion! However, it’s hard to implement as the implanted arrays have been removed.

      (2) Single unit analyses:

      In some analyses, the effects of target speed look more driven by target movement direction (e.g. Figures 1D and E). To confirm target speed is the main modulator, it would be good to compare how much more variance is explained by models including speed rather than just direction. More target speeds may have been helpful here too.

      Nice suggestion! The fitting goodness of the simple model (just motor direction) is much less than the complex model (including target speed). We will update the results in the next version.

      The choice of the three categories (PD shift, gain addition) is not completely justified in a satisfactory way. It would be nice to see whether these three main categories are confirmed by unsupervised methods.

      A good point. We will have a try with unsupervised methods. 

      The decoder analyses in Figure 2 provide evidence that target speed modulation may change over the trial. Therefore, it is important to see how the window considered for the firing rate in Figure 1 (currently 100ms pre - 100ms post movement onset) affects the results.

      Thanks for the suggestion and close reading. We will test the decoder in other epochs.

      (3) Decoder:

      One feature of the task is that the reach endpoints tile the entire perimeter of the target circle (Figure 1B). However, this feature is not exploited for much of the single-unit analyses. This is most notable in Figure 2, where the use of a SVM limits the decoding to discrete values (the endpoints are divided into 8 categories). Using continuous decoding of hand kinematics would be more appropriate for this task.

      This is a very reasonable suggestion. In this study, we discrete the reach-direction as the previous studies (Li et al., 2018&2022) and thought that the discrete decoding was already enough to show the interaction of sensory and motor variables. In future studies, we will try continuous decoding of hand kinematics.

      (4) RNN:

      Mixed selectivity is not analysed in the RNN, which would help to compare the model to the real data where mixed selectivity is common. Furthermore, it would be informative to compare the neural data to the RNN activity using canonical correlation or Procrustes analyses. These would help validate the claim of similarity between RNN and neural dynamics, rather than allowing comparisons to be dominated by geometric similarities that may be features of the task. There is also an absence of alternate models to compare the perturbation model results to.

      Thank you for these helpful suggestions. We will perform decoding analysis on RNN units to verify if there is interaction of sensory and motor variables as in real data, as well as the canonical correlation or Procrustes analysis.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Zhang et al. examine neural activity in the motor cortex as monkeys make reaches in a novel target interception task. Zhang et al. begin by examining the single neuron tuning properties across different moving target conditions, finding several classes of neurons: those that shift their preferred direction, those that change their modulation gain, and those that shift their baseline firing rates. The authors go on to find an interesting, tilted ring structure of the neural population activity, depending on the target speed, and find that (1) the reach direction has consistent positioning around the ring, and (2) the tilt of the ring is highly predictive of the target movement speed. The authors then model the neural activity with a single neuron representational model and a recurrent neural network model, concluding that this population structure requires a mixture of the three types of single neurons described at the beginning of the manuscript.

      Strengths:

      I find the task the authors present here to be novel and exciting. It slots nicely into an overall trend to break away from a simple reach-to-static-target task to better characterize the breadth of how the motor cortex generates movements. I also appreciate the movement from single neuron characterization to population activity exploration, which generally serves to anchor the results and make them concrete. Further, the orbital ring structure of population activity is fascinating, and the modeling work at the end serves as a useful baseline control to see how it might arise.

      Thank you for recognizing our work.

      Weaknesses:

      While I find the behavioral task presented here to be excitingly novel, I find the presented analyses and results to be far less interesting than they could be. Key to this, I think, is that the authors are examining this task and related neural activity primarily with a single-neuron representational lens. This would be fine as an initial analysis since the population activity is of course composed of individual neurons, but the field seems to have largely moved towards a more abstract "computation through dynamics" framework that has, in the last several years, provided much more understanding of motor control than the representational framework has. As the manuscript stands now, I'm not entirely sure what interpretation to take away from the representational conclusions the authors made (i.e. the fact that the orbital population geometry arises from a mixture of different tuning types). As such, by the end of the manuscript, I'm not sure I understand any better how the motor cortex or its neural geometry might be contributing to the execution of this novel task.

      The present study shows the sensory modulation on motor tuning in single units and neural state during motor execution period. It’s a pity that the findings were constrained in certain time windows. We are still working this topic, and hopefully will address related questions in our follow-up studies.

      Main Comments:

      My main suggestions to the authors revolve around bringing in the computation through a dynamics framework to strengthen their population results. The authors cite the Vyas et al. review paper on the subject, so I believe they are aware of this framework. I have three suggestions for improving or adding to the population results:

      (1) Examination of delay period activity: one of the most interesting aspects of the task was the fact that the monkey had a random-length delay period before he could move to intercept the target. Presumably, the monkey had to prepare to intercept at any time between 400 and 800 ms, which means that there may be some interesting preparatory activity dynamics during this period. For example, after 400ms, does the preparatory activity rotate with the target such that once the go cue happens, the correct interception can be executed? There is some analysis of the delay period population activity in the supplement, but it doesn't quite get at the question of how the interception movement is prepared. This is perhaps the most interesting question that can be asked with this experiment, and it's one that I think may be quite novel for the field--it is a shame that it isn't discussed.

      Great idea! We are on the way, and close to complete the puzzle.

      (2) Supervised examination of population structure via potent and null spaces: simply examining the first three principal components revealed an orbital structure, with a seemingly conserved motor output space and a dimension orthogonal to it that relates to the visual input. However, the authors don't push this insight any further. One way to do that would be to find the "potent space" of motor cortical activity by regression to the arm movement and examine how the tilted rings look in that space (this is actually fairly easy to see in the reach direction components of the dPCA plot in the supplement--the rings will be highly aligned in this space). Presumably, then, the null space should contain information about the target movement. dPCA shows that there's not a single dimension that clearly delineates target speed, but the ring tilt is likely evident if the authors look at the highest variance neural dimension orthogonal to the potent space (the "null space")--this is akin to PC3 in the current figures, but it would be nice to see what comes out when you look in the data for it.

      Nice suggestion. Target-speed modulation mainly influences PC3, which is consistent with ‘null space’ hypothesis. We will try other methods of dimensionality reduction (e.g. dPCA, Manopt) to determine the potent and null space.

      (3) RNN perturbations: as it's currently written, the RNN modeling has promise, but the perturbations performed don't provide me with much insight. I think this is because the authors are trying to use the RNN to interpret the single neuron tuning, but it's unclear to me what was learned from perturbing the connectivity between what seems to me almost arbitrary groups of neurons (especially considering that 43% of nodes were unclassifiable). It seems to me that a better perturbation might be to move the neural state before the movement onset to see how it changes the output. For example, the authors could move the neural state from one tilted ring to another to see if the virtual hand then reaches a completely different (yet predictable) target. Moreover, if the authors can more clearly characterize the preparatory movement, perhaps perturbations in the delay period would provide even more insight into how the interception might be prepared.

      We are sorry that we didn’t clarify the definition of “none” type, which can be misleading. The 43% unclassified nodes include those inactive ones, when only activate (task-related) nodes included, the ratio of unclassified nodes would be much lower. By perturbing the connectivity, we intended to explore the interaction between different modulations.

      Thank you for the great advice. We tried moving neural states from one ring to another without changing the directional cluster, but this perturbation didn’t have a significant influence on network performance as expected. We will check this result again and try perturbations in the delay period.

      Reviewer #3 (Public Review):

      Summary:

      This experimental study investigates the influence of sensory information on neural population activity in M1 during a delayed reaching task. In the experiment, monkeys are trained to perform a delayed interception reach task, in which the goal is to intercept a potentially moving target.

      This paradigm allows the authors to investigate how, given a fixed reach endpoint (which is assumed to correspond to a fixed motor output), the sensory information regarding the target motion is encoded in neural activity.

      At the level of single neurons, the authors found that target motion modulates the activity in three main ways: gain modulation (scaling of the neural activity depending on the target direction), shift (shift of the preferred direction of neurons tuned to reach direction), or addition (offset to the neural activity).

      At the level of the neural population, target motion information was largely encoded along the 3rd PC of the neural activity, leading to a tilt of the manifold along which reach direction was encoded that was proportional to the target speed. The tilt of the neural manifold was found to be largely driven by the variation of activity of the population of gain-modulated neurons.

      Finally, the authors studied the behaviour of an RNN trained to generate the correct hand velocity given the sensory input and reach direction. The RNN units were found to similarly exhibit mixed selectivity to the sensory information, and the geometry of the « neural population » resembled that observed in the monkeys.

      Strengths:

      - The experiment is well set up to address the question of how sensory information that is directly relevant to the behaviour but does not lead to a direct change in behavioural output modulates motor cortical activity.

      - The finding that sensory information modulates the neural activity in M1 during motor preparation and execution is non trivial, given that this modulation of the activity must occur in the nullspace of the movement.

      - The paper gives a complete picture of the effect of the target motion on neural activity, by including analyses at the single neuron level as well as at the population level. Additionally, the authors link those two levels of representation by highlighting how gain modulation contributes to shaping the population representation.

      Thanks for your recognition.

      Weaknesses:

      - One of the main premises of the paper is the fact that the motor output for a given reach point is preserved across different target motions. However, as the authors briefly mention in the conclusion, they did not record muscle activity during the task, but only hand velocity, making it impossible to directly verify how preserved muscle patterns were across movements. While the authors highlight that they did not see any difference in their results when resampling the data to control for similar hand velocities across conditions, this seems like an important potential caveat of the paper whose implications should be discussed further or highlighted earlier in the paper.

      Thanks for the suggestion. We will highlight the resampling results as important control in the next version of manuscript.

      - The main takeaway of the RNN analysis is not fully clear. The authors find that an RNN trained given a sensory input representing a moving target displays modulation to target motion that resembles what is seen in real data. This is interesting, but the authors do not dissect why this representation arises, and how robust it is to various task design choices. For instance, it appears that the network should be able to solve the task using only the motion intention input, which contains the reach endpoint information. If the target motion input is not used for the task, it is not obvious why the RNN units would be modulated by this input (especially as this modulation must lie in the nullspace of the movement hand velocity if the velocity depends only on the reach endpoint). It would thus be important to see alternative models compared to true neural activity, in addition to the model currently included in the paper. Besides, for the model in the paper, it would therefore be interesting to study further how the details of the network setup (eg initial spectral radius of the connectivity, weight regularization, or using only the target position input) affect the modulation by the motion input, as well as the trained population geometry and the relative ratios of modulated cells after training.

      Great suggestions. It’s a considerable pity that we didn’t dissect the formation reason and influence factor of the representation in the current version. We’ve tried several combinations of inputs before: in the network which received only motor intention and GO inputs, there were rings but not tilting related to target-speed; in the network which received only target location and GO inputs, there were ring-like structures but not clear directional clusters. We will check these results and try alternative models in the next version. In future studies, we will examine the influence of network setup details.

      - Additionally, it is unclear what insights are gained from the perturbations to the network connectivity the authors perform, as it is generally expected that modulating the connectivity will degrade task performance and the geometry of the responses. If the authors wish the make claims about the role of the subpopulations, it could be interesting to test whether similar connectivity patterns develop in networks that are not initialized with an all-to-all random connectivity or to use ablation experiments to investigate whether the presence of multiple types of modulations confers any sort of robustness to the network.

      Thank you for the great suggestions. By perturbations, we intended to explore the contribution of interaction between certain subpopulations. We tried ablation experiments, but the result was not significant. Probably because the most units were of mixed selectivity, the units of only modulations were not enough for bootstrapping, or the random sampling from single subpopulation (bearing mixed selectivity) could be repeated. We will consider these suggestions carefully in the revised version.

      - The results suggest that the observed changes in motor cortical activity with target velocity result from M1 activity receiving an input that encodes the velocity information. This also appears to be the assumption in the RNN model. However, even though the input shown to the animal during preparation is indeed a continuously moving target, it appears that the only relevant quantity to the actual movement is the final endpoint of the reach. While this would have to be a function of the target velocity, one could imagine that the computation of where the monkeys should reach might be performed upstream of the motor cortex, in which case the actual target velocity would become irrelevant to the final motor output. This makes the results of the paper very interesting, but it would be nice if the authors could discuss further when one might expect to see modulation by sensory information that does not directly affect motor output in M1, and where those inputs may come from. It may also be interesting to discuss how the findings relate to previous work that has found behaviourally irrelevant information is being filtered out from M1 (for instance, Russo et al, Neuron 2020 found that in monkeys performing a cycling task, context can be decoded from SMA but not from M1, and Wang et al, Nature Communications 2019 found that perceptual information could not be decoded from PMd)?

      How and where sensory information modulates M1 are very interesting and open questions. We will discuss further about this topic in the next version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Semenova et al. have studied a large cross-sectional cohort of people living with HIV on suppressive ART, N=115, and performed high dimensional flow cytometry to then search for associations between immunological and clinical parameters and intact/total HIV DNA levels.

      A number of interesting data science/ML approaches were explored on the data and the project seems a serious undertaking. However, like many other studies that have looked for these kinds of associations, there was not a very strong signal. Of course, the goal of unsupervised learning is to find new hypotheses that aren't obvious to human eyes, but I felt in that context, there were (1) results slightly oversold, (2) some questions about methodology in terms mostly of reservoir levels, and (3) results were not sufficiently translated back into meaning in terms of clinical outcomes.

      We appreciate the reviewer’s perspective.  In our revised version of the manuscript, we have attempted to address these concerns by more adequately explaining the limitations of the study and by more thoroughly discussing the context of the findings.  We are not able to associate the findings with specific clinical outcomes for individual study participants but we speculate about the overall biological meaning of these associations across the cohort.  We cannot disagree with the reviewer, but we find the associations statistically significant, potentially reflecting real biological associations, and forming the basis for future hypothesis testing research. 

      Strengths:

      The study is evidently a large and impressive undertaking and combines many cutting-edge statistical techniques with a comprehensive experimental cohort of people living with HIV, notably inclusive of populations underrepresented in HIV science. A number of intriguing hypotheses are put forward that could be explored further. Sharing the data could create a useful repository for more specific analyses.

      We thank the reviewer for this assessment.

      Weaknesses:

      Despite the detailed experiments and methods, there was not a very strong signal for the variable(s) predicting HIV reservoir size. The Spearman coefficients are ~0.3, (somewhat weak, and acknowledged as such) and predictive models reach 70-80% prediction levels, though sometimes categorical variables are challenging to interpret.

      We agree with the reviewer that individual parameters are only weakly correlated with the HIV reservoir, likely reflecting the complex and multi-factorial nature of reservoir/immune cell interactions.  Nevertheless, these associations are statistically significant and form the basis for functional testing in viral persistence.

      There are some questions about methodology, as well as some conclusions that are not completely supported by results, or at minimum not sufficiently contextualized in terms of clinical significance.  On associations: the false discovery rate correction was set at 5%, but data appear underdetermined with fewer observations than variables (144vars > 115ppts), and it isn't always clear if/when variables are related (e.g inverses of one another, for instance, %CD4 and %CD8).

      When deriving a list of cell populations whose frequency would be correlated with the reservoir, we focused on well-defined cell types for which functional validation exists in the literature to consider them as distinct cell types.  For many of the populations, gating based on combinations of multiple markers leads to recovery of very few cells, and so we excluded some potential combinations from the analysis.  We are also making our raw data available for others to examine and find associations not considered by our manuscript.

      The modeling of reservoir size was unusual, typically intact and defective HIV DNA are analyzed on a log10 scale (both for decays and predicting rebound). Also, sometimes in this analysis levels are normalized (presumably to max/min?, e.g. S5), and given the large within-host variation of level we see in other works, it is not trivial to predict any downstream impact of normalization across population vs within-person.

      We have repeated the analysis using log10 transformed data and the new figures are shown in Figure 1 and S2-S5.

      Also, the qualitative characterization of low/high reservoir is not standard and naturally will split by early/later ART if done as above/below median. Given the continuous nature of these data, it seems throughout that predicting above/below median is a little hard to translate into clinical meaning.

      Our ML models included time before ART as a variable in the analysis, and this was not found to be a significant driver of the reservoir size associations, except for the percentage of intact proviruses (see Figure 2C). Furthermore, we analyzed whether any of the reservoir correlated immune variables were associated with time on ART and found that, although some immune variables are associated with time on therapy, this was not the case for most of them (Table S4). We agree that it is challenging to translate above or below median into clinical meaning for this cohort, but we emphasize that this study is primarily a hypothesis generating approach requiring additional validation for the associations observed.  We attempted to predict reservoir size as a continuous variable using the data and this approach was not successful (Figure S13). We believe that a significantly larger cohort will likely be required to generate a ML model that can accurately predict the reservoir as a continuous variable.  We have added additional discussion of this to the manuscript.

      Lastly, the work is comprehensive and appears solid, but the code was not shared to see how calculations were performed.

      We now provide a link to the code used to perform the analyses in the manuscript, https://github.com/lesiasemenova/ML_HIV_reservoir.

      Reviewer #2 (Public Review):

      Summary:

      Semenova et. al., performed a cross-sectional analysis of host immunophenotypes (using flow cytometry) and the peripheral CD4+ T cell HIV reservoir size (using the Intact Proviral DNA Assay, IPDA) from 115 people with HIV (PWH) on ART. The study mostly highlights the machine learning methods applied to these host and viral reservoir datasets but fails to interpret these complex analyses into (clinically, biologically) interpretable findings. For these reasons, the direct translational take-home message from this work is lost amidst a large list of findings (shown as clusters of associated markers) and sentences such as "this study highlights the utility of machine learning approaches to identify otherwise imperceptible global patterns" - lead to overinterpretation of their data.

      We have addressed the reviewer’s concern by modifications to the manuscript that enhance the interpretation of the findings in a clinical and biological context.

      Strengths:

      Measurement of host immunophenotyping measures (multiparameter flow cytometry) and peripheral HIV reservoir size (IPDA) from 115 PWH on ART.

      Major Weaknesses:

      (1) Overall, there is little to no interpretability of their machine learning analyses; findings appear as a "laundry list" of parameters with no interpretation of the estimated effect size and directionality of the observed associations. For example, Figure 2 might actually give an interpretation of each X increase in immunophenotyping parameter, we saw a Y increase/decrease in HIV reservoir measure.

      We have added additional text to the manuscript in which we attempt to provide more immunological and clinical interpretation of the associations.  We also have emphasized that these associations are still speculative and will require additional validation.  Nevertheless, our data should provide a rich source of new hypotheses regarding immune system/reservoir interaction that could be tested in future work.

      (2) The correlations all appear to be relatively weak, with most Spearman R in the 0.30 range or so.

      We agree with the review that the associations are mostly weak, consistent with previous studies in this area.  This likely is an inherent feature of the underlying biology – the reservoir is likely associated with the immune system in complex ways and involves stochastic processes that will limit the predictability of reservoir size using any single immune parameter. We have added additional text to the manuscript to make this point clearer.

      (3) The Discussion needs further work to help guide the reader. The sentence: "The correlative results from this present study corroborate many of these studies, and provide additional insights" is broad. The authors should spend some time here to clearly describe the prior literature (e.g., describe the strength and direction of the association observed in prior work linking PD-1 and HIV reservoir size, as well as specify which type of HIV reservoir measures were analyzed in these earlier studies, etc.) and how the current findings add to or are in contrast to those prior findings.

      We have added additional text to the manuscript to help guide the readers through the possible biological significance of the findings and the context with respect to prior literature.

      (4) The most interesting finding is buried on page 12 in the Discussion: "Uniquely, however, CD127 expression on CD4 T cells was significantly inversely associated with intact reservoir frequency." The authors should highlight this in the abstract, and title, and move this up in the Discussion. The paper describes a very high dimensional analysis and the key takeaways are not clear; the more the author can point the reader to the take-home points, the better their findings can have translatability to future follow-up mechanistic and/or validation studies.

      We appreciate the reviewer’s comment.  We have increased the emphasis on this finding in the revised version of the manuscript.

      (5) The authors should avoid overinterpretation of these results. For example in the Discussion on page 13 "The existence of two distinct clusters of PWH with different immune features and reservoir characteristics could have important implications for HIV cure strategies - these two groups may respond differently to a given approach, and cluster membership may need to be considered to optimize a given strategy." It is highly unlikely that future studies will be performing the breadth of parameters resulting here and then use these directly for optimizing therapy.

      Our analyses indicate that membership of study participants in cluster1 or cluster 2 can be fairly accurately determined by a small number of individual parameters (KLRG1 etc, Figure 4F), and measuring the cells of PWH with the degree of breadth used in this paper would not be necessary to classify PWH into these clusters.  As such, we feel that it is not unrealistic to speculate that this finding could turn out to be clinically useful, if it becomes clear that the clusters are biologically meaningful.

      (6) There are only TWO limitations listed here: cross-sectional study design and the use of peripheral blood samples. (The subsequent paragraph notes an additional weakness which is misclassification of intact sequences by IPDA). This is a very limited discussion and highlights the need to more critically evaluate their study for potential weaknesses.

      We have expanded on the list of limitations discussed in the manuscript. In particular, we now address the size of the cohort, the composition with respect to different genders and demographics, lack of information for the timing of ART and the lack of information regarding intracellular transcriptional pathways.

      (7) A major clinical predictor of HIV reservoir size and decay is the timing of ART initiation. The authors should include these (as well as other clinical covariate data - see #12 below) in their analyses and/or describe as limitations of their study.

      All of the participants that make up our cohort were treated during chronic infection, and the precise timing of ART initiation is unclear in most of these cases.  We have added additional information to explain this in the manuscript and include this in the list of limitations.

      Reviewer #3 (Public Review):

      Summary:

      This valuable study by Semenova and colleagues describes a large cross-sectional cohort of 115 individuals on ART. Participants contributed a single blood sample which underwent IPDA, and 25-color flow with various markers (pre and post-stimulation). The authors then used clustering, decision tree analyses, and machine learning to look for correlations between these immunophenotypic markers and several measures of HIV reservoir volume. They identified two distinct clusters that can be somewhat differentiated based on total HIV DNA level, intact HIV DNA level, and multiple T cell cellular markers of activation and exhaustion.

      The conclusions of the paper are supported by the data but the relationships between independent and dependent variables in the models are correlative with no mechanistic work to determine causality. It is unclear in most cases whether confounding variables could explain these correlations. If there is causality, then the data is not sufficient to infer directionality (ie does the immune environment impact the HIV reservoir or vice versa or both?). In addition, even with sophisticated and appropriate machine learning approaches, the models are not terribly predictive or highly correlated. For these reasons, the study is very much hypothesis-generating and will not impact cure strategies or HIV reservoir measurement strategies in the short term.

      We appreciate the reviewer’s comments regarding the value of our study.  We fully acknowledge that the causal nature and directionality of these associations are not yet clear and agree that the study is primarily hypothesis generating in nature.  Nevertheless, we feel that the hypotheses generated will be valuable to the field.  We have added additional text to the manuscript to emphasize the hypothesis generating nature of this paper.

      Strengths:

      The study cohort is large and diverse in terms of key input variables such as age, gender, and duration of ART. Selection of immune assays is appropriate. The authors used a wide array of bioinformatic approaches to examine correlations in the data. The paper was generally well-written and appropriately referenced.

      Weaknesses:

      (1) The major limitation of this work is that it is highly exploratory and not hypothesis-driven. While some interesting correlations are identified, these are clearly hypothesis-generating based on the observational study design.

      We agree that the major goal of this study was hypothesis generating and that our work is exploratory in nature. Performing experiments with mechanism testing goals in human participants with HIV is challenging.  Additionally, before such mechanistic studies can be undertaken, one must have hypotheses to test. As such we feel our study will be useful for the field in helping to identify hypotheses that could potentially be tested.

      (2) The study's cross-sectional nature limits the ability to make mechanistic inferences about reservoir persistence. For instance, it would be very interesting to know whether the reservoir cluster is a feature of an individual throughout ART, or whether this outcome is dynamic over time.

      We agree with the reviewer’s comment. Longitudinal studies are challenging to carry out with a study cohort of this size, and addressing questions such as the one raised by the reviewer would be of great interest. We believe our study nevertheless has value in identifying hypotheses that could be tested in a longitudinal study.

      (3) A fundamental issue is that I am concerned that binarizing the 3 reservoir metrics in a 50/50 fashion is for statistical convenience. First, by converting a continuous outcome into a simple binary outcome, the authors lose significant amounts of quantitative information. Second, the low and high reservoir outcomes are not actually demonstrated to be clinically meaningful: I presume that both contain many (?all) data points above levels where rebound would be expected soon after interruption of ART. Reservoir levels would also have no apparent outcome on the selection of cure approaches. Overall, dividing at the median seems biologically arbitrary to me.

      The reviewer raises a valid point that the clinical significance of above or below median reservoir metrics is unclear, and that the size of the reservoir has potentially little relation to rebound and cure approaches.  In the manuscript, we attempted to generate models that can predict reservoir size as a continuous variable in Figure S13 and find that this approach performs poorly, while a binarized approach was more successful. As such we have included both approaches in the manuscript.  It is possible that future studies with larger sample sizes and more detailed measurements will perform better for continuous variable prediction.  While this is a fairly large study (n=115) by the standards of HIV reservoir analyses, it is a small study by the standards of the machine learning field, and accurate predictive ML models for reservoir size as a continuous variable will likely require a much larger set of samples/participants.  Nevertheless, we feel our work has value as a template for ML approaches that may be informative for understanding HIV/immune interactions and generates novel hypotheses that could be validated by subsequent studies.

      (4) The two reservoir clusters are of potential interest as high total and intact with low % intact are discriminated somewhat by immune activation and exhaustion. This was the most interesting finding to me, but it is difficult to know whether this clustering is due to age, time on ART, other co-morbidity, ART adherence, or other possible unmeasured confounding variables.

      We agree that this finding is one of the more interesting outcomes of the study. We examined a number of these variables for association with cluster membership, and these data are reported in Figure S8A-D.  Age, years of ART and CD4 Nadir were all clearly different between the clusters.   The striking feature of this clustering, however, is the clear separation between the two groups of participants, as opposed to a continuous gradient of phenotypes.  This could reflect a bifurcation of outcomes for people with HIV, dynamic changes in the reservoir immune interactions over time, or different levels of untreated infection.  It is certainly possible that some other unmeasured confounding variables contribute to this outcome and we have attempted to make this limitation clearer.

      (5) At the individual level, there is substantial overlap between clusters according to total, intact, and % intact between the clusters. Therefore, the claim in the discussion that these 2 cluster phenotypes may require different therapeutic approaches seems rather speculative. That said, the discussion is very thoughtful about how these 2 clusters may develop with consideration of the initial insult of untreated infection and / or differences in immune recovery.

      We agree with the reviewer that this claim is speculative, and we have attempted to moderate the language of the text in the revised version.

      (6) The authors state that the machine learning algorithms allow for reasonable prediction of reservoir volume. It is subjective, but to me, 70% accuracy is very low. This is not a disappointing finding per se. The authors did their best with the available data. It is informative that the machine learning algorithms cannot reliably discriminate reservoir volume despite substantial amounts of input data. This implies that either key explanatory variables were not included in the models (such as viral genotype, host immune phenotype, and comorbidities) or that the outcome for testing the models is not meaningful (which may be possible with an arbitrary 50/50 split in the data relative to median HIV DNA volumes: see above).

      We acknowledge that the predictive power of the models generated from these data is modest and we have clarified this point in the revised manuscript. As the reviewer indicates, this may result from the influence of unmeasured variables and possible stochastic processes.  The data may thus demonstrate a limit to the predictability of reservoir size which may be inherent to the underlying biology.  As we mention above, this study size (n-115) is fairly small for the application of ML methods, and an increased sample size will likely improve the accuracy of the models. At this stage, the models we describe are not yet useful as predictive clinical tools, but are still nonetheless useful as tools to describe the structure of the data and identify reservoir associated immune cell types.

      (7) The decision tree is innovative and a useful addition, but does not provide enough discriminatory information to imply causality, mechanism, or directionality in terms of whether the immune phenotype is impacting the reservoir or vice versa or both. Tree accuracy of 80% is marginal for a decision tool.

      The reviewer is correct about these points.  In the revised manuscript, we have attempted to make it clear that we are not yet advocating using this approach as a decision tool, but simply a way to visualize the data and understand the structure of the dataset.  As we discuss above, the models will likely need to be trained on a larger dataset and achieve higher accuracy before use as a decision tool.

      (8) Figure 2: this is not a weakness of the analysis but I have a question about interpretation. If total HIV DNA is more predictive of immune phenotype than intact HIV DNA, does this potentially implicate a prior high burden of viral replication (high viral load &/or more prolonged time off ART) rather than ongoing reservoir stimulation as a contributor to immune phenotype? A similar thought could be applied to the fact that clustering could only be detected when applied to total HIV DNA-associated features. Many investigators do not consider defective HIV DNA to be "part of the reservoir" so it is interesting to speculate why these defective viruses appear to have more correlation with immunophenotype than intact viruses.

      We agree with the reviewer that this observation could reflect prior viral burden and we have added additional text to make this clearer.  Even so, we cannot rule out a model in which defective viral DNA is engaged in ongoing stimulation of the immune system during ART, leading to the stronger association between total DNA and the immune cell phenotypes. We hypothesize that the defective proviruses could potentially be triggering innate immune pattern recognition receptors via viral RNA or DNA, and a higher burden of the total reservoir leads to a stronger apparent association with the immune phenotype.  We have included text in the discussion about this hypothesis.

      (9) Overall, the authors need to do an even more careful job of emphasizing that these are all just correlations. For instance, HIV DNA cannot be proven to have a causal effect on the immunophenotype of the host with this study design. Similarly, immunophenotype may be affecting HIV DNA or the correlations between the two variables could be entirely due to a separate confounding variable

      We have revised the text of the manuscript to emphasize this point, and we acknowledge that any causal relationships are, at this point, simply speculation. 

      (10) In general, in the intro, when the authors refer to the immune system, they do not consistently differentiate whether they are referring to the anti-HIV immune response, the reservoir itself, or both. More specifically, the sentence in the introduction listing various causes of immune activation should have citations. (To my knowledge, there is no study to date that definitively links proviral expression from reservoir cells in vivo to immune activation as it is next to impossible to remove the confounding possible imprint of previous HIV replication.) Similarly, it is worth mentioning that the depletion of intact proviruses is quite slow such that provial expression can only be stimulating the immune system at a low level. Similarly, the statement "Viral protein expression during therapy likely maintains antigen-specific cells of the adaptive immune system" seems hard to dissociate from the persistence of immune cells that were reactive to viremia.

      We updated the text of the manuscript to address these points and have added additional citations as per the reviewer’s suggestion.

      (11) Given the many limitations of the study design and the inability of the models to discriminate reservoir volume and phenotype, the limitations section of the discussion seems rather brief.

      We have now expanded the limitations section of the discussion and added additional considerations. We now include a discussion of the study cohort size, composition and the detail provided by the assays.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A few specific comments:

      "This pattern is likely indicative of a more profound association of total HIV DNA with host immunophenotype relative to intact HIV DNA."

      Most studies I have seen (e.g. single cell from Lictherfeld/Yu group) show intact proviruses are generally more activated/detectable/susceptible to immune selection, so I have a hard time thinking defective proviruses are actually more affected by immunotype.

      We hypothesize that this association is actually occurring in the opposite direction – that the defective provirus are having a greater impact on the immune phenotype, due to their greater number and potential ability to engage innate or adaptive immune receptors. We have clarified this point in the manuscript

      "The existence of two distinct clusters of PWH with different immune features and reservoir characteristics could have important implications for HIV cure strategies - these two groups may respond differently to a given approach, and cluster membership may need to be considered to optimize a given strategy."

      I find this a bit of a reach, given that the definition of 2 categories depended on the total size.

      We have modified the language of this section to reduce the level of speculation.

      "This study is cross-sectional in nature and is primarily observational, so caution should be used interpreting findings associated with time on therapy".

      I found this an interesting statement because ultimately time on ART shows up throughout the analysis as a significant predictor, do you mean something about how time on ART could indicate other confounding variables like ART regimen or something?

      We have rephrased this comment to avoid confusion.  We were simply trying to make the point that we should avoid speculating about longitudinal dynamics from cross sectional data.

      "As expected, the plots showed no significant correlation for intact HIV DNA versus years of ART (Figure 1B), while total reservoir size was positively correlated with the time of ART (Figure 1A, Spearman r = 0.31)."<br />  Is this expected? Studies with longitudinal data almost uniformly show intact decay, at least for the first 10 or so years of ART, and defective/total stability (or slight decay). Also probably "time on ART" to not confuse with the duration of infection before ART.

      We have updated the language of this section to address this comment.  We have avoided comparing our data with respect to time on ART to longitudinal studies for reasons given above.

      On dimensionality reduction, as this PaCMAP seems a relatively new technique (vs tSNE and UMAP which are more standard, but absolutely have their weaknesses), it does seem important to contextualize. I think it would still be useful to show PCA and asses the % variance of each additional dimension to assess the effective dimensionality, it would be helpful to show a plot of % variance by # components to see if there is a cutoff somewhere, and if PaCMAP is really picking this up to determine the 2 dimensions/2 clusters is ideal. Figure 4B ultimately shows a lot of low/high across those clusters, and since low/high is defined categorically it's hard to know which of those dots are very close to the other categories.

      We have added this analysis to the manuscript – found in Figure S9. The PCA plot indicates that members of the two clusters also separate on PCA although this separation is not as clear as for the PaCMAP plot.

      Minor comments on writing etc:

      Intro

      -Needs some references on immune activation sequelae paragraph.

      We have added some additional references to this section.

      -"promote the entry of recently infected cells into the reservoir" -- that is only one possible mechanistic explanation, it's not unreasonable but it seems important to keep options open until we have more precise data that can illuminate the mechanism of the overabundance.

      We have modified the text to discuss additional hypotheses.

      -You might also reference Pankau et al Ppath for viral seeding near the time of ART.

      We have added this reference.

      -"Viral protein expression during therapy likely maintains antigen-specific cells of the adaptive immune system" - this was unclear to me, do you mean HIV-specific cells that act against HIV during ART? I think most studies show immunity against HIV (CD8 and CD4) wanes over time during ART.

      The Goonetilleke lab has recently generated data indicating that antiviral T cell responses are remarkably stable over time on ART, but we agree with the reviewer that the idea that ongoing antigen expression in the reservoir maintains these cells is speculative.  We have modified the text to make this point clearer.

      -Overall I think the introduction lacked a little bit of definitional precision: i.e. is the reservoir intact vs replication competent vs all HIV DNA and whether we are talking about PWH on long-term ART and how long we should be imagining? The first years of ART are certainly different than later, in terms of dynamics. The ultimate implications are likely specific for some of these categorizations.

      -"persistent sequelae of the massive disruptions to T cell homeostasis and lymphoid structures that occur during untreated HIV infection" needs a lot more context/referencing. For instance, Peter Hunt showed a decrease in activation after ART a long time ago.

      -Heather Best et al show T cell clonality stays perturbed after ART.

      We have updated the text of the introduction and added references to address the reviewer’s comments.

      Results

      -It would be important to mention the race of participants and any information about expected clades of acquired viruses, this gets mentioned eventually with reference to the Table but the breakdown would be helpful right away.

      We have added this information to the results section.

      -"performed Spearman correlations", may be calculated or tested?

      We have corrected the language for this sentence.

      Comments on figures:

      -Figure 1 data on linear scale (re discussion above) -- hard to even tell if there is a decay (to match with all we know from various long-term ART studies).

      -Figure 4 data is shown on ln (log_e) scale, which is hard to interpret for most people.

      -Figures 4 C,D, and E should have box plots to visually assess the significance.

      -Figure 4B legend says purple/pink but I think the colors are different in the plot, could be about transparency

      -Figure 5 it is now not clear if log_e(?).

      -Figure 6 "HIV reservoir characteristics" might be better to make this more explicit. Do you mean for instance in the 6B title Total HIV DNA per million CD4+ T cells I think?

      We have made these modifications.

      Reviewer #2 (Recommendations For The Authors):

      Minor Weaknesses:

      (1) The Introduction is too long and much of the text is not directly related to the study's research question and design.

      We have streamlined the introduction in the revised manuscript.

      (2) While no differences were seen by age or race, according to the authors, this is unlikely to be useful since the numbers are so small in some of these subcategories. Results from sensitivity analyses (e.g., excluding these individuals) may be more informative/useful.

      We agree that the lower numbers of participants for some subgroupings makes it challenging to know for sure if there are any differences based on these variables.  Have added text to clarify this. We have added age, race and gender to the LOCO analysis and to the variable inflation importance analysis (Table S5).

      (3) For Figure 4, based on what was described in the Results section of the manuscript, the authors should clarify that the figures show results for TOTAL HIV DNA only (not intact DNA): "Dimension reduction machine learning approaches identified two robust clusters of PWH when using total HIV DNA reservoir-associated immune cell frequencies (Figure 4A), but not for intact or percentage intact HIV DNA (Figure 4B and 4C)".

      We have added this information.

      (4) The statement on page 5, first paragraph, "Interestingly, when we examined a plot of percent intact proviruses versus time on therapy (Figure 1C), we observed a biphasic decay pattern," is not new (Peluso JCI Insight 2020, Gandhi JID 2023, McMyn JCI 2023). Prior studies have clearly demonstrated this biphasic pattern and should be cited here, and the sentence should be reworded with something like "consistent with prior work", etc.

      We have added citations to these studies and rephrased this comment.

      (5) The Cohort and sample collection sections are somewhat thin. Further details on the cohort details should include at the very minimum some description of the timing of ART initiation (is this mostly a chronic-treated cohort?) and important covariate data such as nadir CD4+ T cell count, pre-ART viral load, duration of ART suppression, etc.

      The cohort was treated during chronic infection, and we have clarified this in the manuscript.  Information regarding CD4 nadir and years on ART are included in Table 1.  Unfortunately, pre-ART viral load was not available for most members of this cohort, so we did not use it for analyses. The partial pre-ART viral load data is included with the dataset we are making publicly available.

      Reviewer #3 (Recommendations For The Authors):

      Minor points:

      (1) What is meant by CD4 nadir? Is this during primary infection or the time before ART initiation?

      We have clarified this description in the manuscript.  This term refers to the lowest CD4 count recorded during untreated infection.

      (2) The authors claim that determinants of reservoir size are starting to emerge but other than the timing of ART, I am not sure what studies they are referring to.

      We have updated the language of this section.  We intended to refer to studies looking at correlates of reservoir size, and feel that this is a more appropriate term that ‘determinants’

      (3) The discussion does not tie in the model-generated hypotheses with the known mechanisms that sustain the reservoir: clonal proliferation balanced by death and subset differentiation. It would be interesting to tie in the proposed reservoir clusters with these known mechanisms.

      We have added additional text to the manuscript to address these mechanisms.

      (4) Figure 1: Total should be listed as total HIV DNA.

      We have updated this in the manuscript.

      (5) Figure 1C: Worth mentioning the paper by Reeves et al which raises the possibility that the flattening of intact HIV DNA at 9 years may be spurious due to small levels of misclassification of defective as intact.

      We have added this reference.

      (6) "Total reservoir frequency" should be "total HIV DNA concentration"

      We respectfully feel that “frequency” is a more accurate term than “concentration”, since we are expressing the reservoir as a fraction of the CD4 T cells, while “concentration” suggests a denominator of volume.

      (7) Figure S2-5: label y-axis total HIV DNA.

      We have updated this figure.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Rtf1 HMD domain facilitates global histone H2B monoubiquitination and regulates morphogenesis and virulence in the meningitis-causing pathogen Cryptococcus neoformans" by Jiang et al., the authors employ a combination of molecular genetics and biochemical approaches, along with phenotypic evaluations and animal models, to identify the conserved subunit of the Paf1 complex (Paf1C), Rtf1, and functionally characterize its critical roles in mediating H2B monoubiquitination (H2Bub1) and the consequent regulation of gene expression, fungal development, and virulence traits in C. deneoformans or C. neoformans. Specially, the authors found that the histone modification domain (HMD) of Rtf1 is sufficient to promote H2B monoubiquitination (H2Bub1) and the expression of genes related to fungal mating and filamentation, and restores the fungal morphogenesis and pathogenicity defects caused by RTF1 deletion.

      Strengths:

      The manuscript is well-written and presents the findings in a clear manner. The findings are interesting and contribute to a better understanding of Rtf1-mediated epigenetic regulation of fungal morphogenesis and pathogenicity in a major human fungal pathogen, and potentially in other fungal species, as well.

      Weaknesses:

      A major limitation of this study is the absence of genome-wide information on Rtf1-mediated H2B monoubiquitination (H2Bub1), as well as a lack of detail regarding the function of the Plus3 domain. Although overexpression of HMD in the rtf1Δ mutant restored global H2Bub1 levels, it did not rescue certain critical biological functions, such as growth at 39 °C and melanin production (Figure 4C-D). This suggests that the precise positioning of H2Bub1 is essential for Rtf1's function. A comprehensive epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 would elucidate potential mechanisms and shed light on the function of the Plus3 domain.

      We thank the reviewer (and other reviewers) for this excellent suggestion. We have planned to carry out CUT&Tag assay to gain a comprehensive epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 under conditions, where overexpression of HMD failed to rescue the phenotypes in the _rtf1_Δ mutant, such as growth at 39 °C.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to determine the role of Rtf1 in Cryptococcal biology, and demonstrate that Rtf1 acts independently of the Paf1 complex to exert regulation of Histone H2B monoubiquitylation (H2Bub1). The biological impact of the loss of H2Bub1 was observed in defects in morphogenesis, reduced production of virulence factors, and reduced pathogenic potential in animal models of cryptococcal infection.

      Strengths:

      The molecular data is quite compelling, demonstrating that the Rtf1-depednent functions require only this histone modifying domain of Rtf1, and are dependent on nuclear localization. A specific point mutation in a residue conserved with the Rtf1 protein in the model yeast demonstrates the conservation of that residue in H2Bub1 modification. Interestingly, whereas expression of the HMD alone suppressed the virulence defect of the rtf1 deletion mutant, it did not suppress defects in virulence factor production.

      Weaknesses:

      The authors use two different species of Cryptococcus to investigate the biological effect of Rtf1 deletion. The work on morphogenesis utilized C. deneoformans, which is well-known to be a robust mating strain. The virulence work was performed in the C. neoformans H99 background, which is a highly pathogenic isolate. The study would be more complete if each of these processes were assessed in the other strain to understand if these biological effects are conserved across the two species of Cryptococcus. H99 is not as robust in morphogenesis, but reproducible results assessing mating and filamentation in this strain have been performed. Similarly, C. deneoformans does produce capsule and melanin.

      This is a fair point raised by the reviewer, and we are going to test whether these biological effects are conserved across the two species. We will access effects of RTF1 deletion on bisexual mating hyphal formation in C. neoformans H99 background and capsule and melanin productions in C. deneoformans XL280 background.

      There are some concerns with the conclusions related to capsule induction. The images reported in Figure B are purported to be grown under capsule-inducing conditions, yet the H99 panel is not representative of the induced capsule for this strain. Given the lack of a baseline of induction, it is difficult to determine if any of the strains may be defective in capsule induction. Quantification of a population of cells with replicates will also help to visualize the capsular diversity in each strain population.

      We thank the reviewer for raising this concern. We are going to confirm the conclusions related to capsule induction under multiple capsule-inducing conditions, including Dulbecco’s Modified Eagle’s Medium (DMEM), Littman’s medium, and 10% fetal bovine serum (FBS) agar medium [1].

      The authors demonstrate that for specific mating-related genes, the expression of the HMD recapitulated the wild-type expression pattern. The RNA-seq experiments were performed under mating conditions, suggesting specificity under this condition. The authors raise the point in the discussion that there may be differences in Rtf1 deposition on chromatin in H99, and under conditions of pathogenesis. The data that overexpression of HMD restores H2Bub1 by western is quite compelling, but does not address at which promoters H2Bub1 is modulating expression under pathogenesis conditions, and when full-length Rtf1 is present vs. only the HMD.

      We thank the reviewer for raising these concerns. As mentioned in the response to Reviewer 1, our CUT&Tag assay will provide evidence to address these questions.

      Reviewer #3 (Public Review):

      Summary:

      In this very comprehensive study, the authors examine the effects of deletion and mutation of the Paf1C protein Rtf1 gene on chromatin structure, filamentation, and virulence in Cryptococcus.

      Strengths:

      The experiments are well presented and the interpretation of the data is convincing.

      Weaknesses:

      Yet, one can be frustrated by the lack of experiments that attempt to directly correlate the change in chromatin structure with the expression of a particular gene and the observed phenotype. For example, the authors observed a strong defect in the expression of ZNF2, a known regulator of filamentation, mating, and virulence, in the rtf1 mutant. Can this defect explain the observed phenotypes associated with the RTF1 mutation? Is the observed defect in melanin production associated with altered expression of laccase genes and altered chromatin structure at this locus?

      We completely agree with the reviewer, and as mentioned in our response to Reviewer 1 and 2, we are going to conduct CUT&Tag assay to investigate the genetic relationship between Rtf1-mediated H2Bub1 and the expression of particular genes.

      (1) Jang, E.-H., et al., Unraveling Capsule Biosynthesis and Signaling Networks in Cryptococcus neoformans. Microbiology Spectrum, 2022. 10(6): p. e02866-22.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Building upon their famous tool for the deconvolution of human transcriptomics data (EPIC), Gabriel et al. implemented a new methodology for the quantification of the cellular composition of samples profiled with Assay for Transposase-Accessible Chromatin sequencing (ATAC-Seq). To build a signature for ATAC-seq deconvolution, they first created a compendium of ATAC-seq data and derived chromatin accessibility marker peaks and reference profiles for 21 cell types, encompassing immune cells, endothelial cells, and fibroblasts. They then coupled this novel signature with the EPIC deconvolution framework based on constrained least-square regression to derive a dedicated tool called EPIC-ATAC. The method was then assessed using real and pseudo-bulk RNA-seq data from human peripheral blood mononuclear cells (PBMC) and, finally, applied to ATAC-seq data from breast cancer tumors to show it accurately quantifies their immune contexture.

      Strengths:

      Overall, the work is of very high quality. The proposed tool is timely; its implementation, characterization, and validation are based on rigorous methodologies and resulted in robust results. The newly-generated, validation data and the code are publicly available and well-documented. Therefore, I believe this work and the associated resources will greatly benefit the scientific community.

      Weaknesses:

      CA few aspects can be improved to clarify the value and applicability of the EPIC-ATAC and the transparency of the benchmarking analysis.

      (1) Most of the validation results in the main text assess the methods on all cell types together, by showing the correlation, RMSE, and scatterplots of the estimated vs. true cell fractions. This approach is valuable for showing the overall method performance and for detecting systematic biases and noisy estimates. However, it provides very limited insights regarding the capability of the methods to estimate the individual cell types, which is the ultimate aim of deconvolution analysis. This limitation is exacerbated for rare cell types, which could even have a negative correlation with the ground truth fractions, but not weigh much on the overall RMSE and correlation. I would suggest integrating into the main text and figures an in-depth assessment of the individual cell types. In particular, it should be shown and discussed which cell types can be accurately quantified and which ones are less reliable.

      We thank the reviewer for raising this important point. Discussing the accuracy of EPIC-ATAC in predicting individual cell-type proportions would indeed be valuable in the main text. We have updated the text as follows.

      In the first version of our manuscript, we had a section called “T cell subtypes quantification reveals the ATAC-Seq deconvolution limits for closely related cell types” which highlighted that EPIC-ATAC shows low performances when predicting the proportions of cell types that are closely related, e.g., CD4+ T cell or CD8+ T cell subtypes. The section is now named “Accuracy of ATAC-Seq deconvolution is determined by the abundance and specificity of each cell type” and has been expanded to discuss the accuracy of EPIC-ATAC predictions within each major cell type.

      To do so, we represented in Figure 5A the performances of EPIC-ATAC in each cell type present in the benchmarking datasets from Figures 3 and 4. Additionally, we have kept in the supplementary figures the details of the correlation values and RMSE values within each cell type and for each tool (Supplementary Figures 9 and 10). The following text has been added in the main text to describe these analyses:

      “Accuracy of ATAC-Seq deconvolution is determined by the abundance and specificity of each cell type

      To investigate the impact of cell type abundance on the accuracy of ATAC-Seq deconvolution, we evaluated EPIC-ATAC predictions in each major cell type separately in the different benchmarking datasets (Figure 5A). NK cells, endothelial cells, neutrophils or dendritic cells showed lower correlation values. These values can be explained by the fact that these cell types are low-abundant in our benchmarking datasets (Figure 5A). For the endothelial cells and dendritic cells, the RMSE values associated to these cell types remain low. This suggests that while the predictions of EPIC-ATAC might not be precise enough to compare these cell-type proportions between different samples, the cell-type quantification within each sample is reliable. For the NK cells and the neutrophils, we observed more variability with higher RMSE values in some datasets which suggests that the markers and profiles for these cell types might be improved. Supplementary Figures 9 and 10 detail the performances of each tool when considering each cell type separately in the PBMC and the cancer datasets. As for EPIC-ATAC, the predictions from the other deconvolution tools are more reliable for the frequent cell types.”

      (2) In the benchmarking analysis, EPIC-ATAC is compared to several deconvolution methods, most of which were originally developed for transcriptomics data. This comparison is not completely fair unless their peculiarities and the limitations of tweaking them to work with ATAC-seq data are discussed. For instance, some methods (including the original EPIC) correct for cell-type-specific mRNA bias, which is not present in ATAC-seq data and might, thus, result in systematic errors.

      We thank the reviewer for this comment and have updated the results and methods sections as follows:

      We provide in the Materials and methods section, the paragraph “Benchmarking of the EPIC-ATAC framework against other existing deconvolution tools” which describes how each tool included in the benchmark was used in the ATAC-Seq context. We have added a reference to this section in the main text when introducing the first benchmarking analysis.

      For each tool, the main changes consisted in: (i) replacing the initial RNA-Seq profiles and markers by the EPIC-ATAC reference profiles and markers and (ii) providing as input a bulk ATAC-Seq dataset with matched ATAC-Seq features (the same approach as the one used in EPIC-ATAC was considered, see answer to the next comment). Having reference profiles/markers and an ATAC-Seq bulk query with matched features was the only requirement of the different deconvolution models to be able to run on ATAC-Seq data with the default methods parameters, except for quanTIseq. Indeed, this method, like EPIC, corrects its estimations for cell-type-specific mRNA content bias. We have disabled this option for the bulk ATAC-Seq deconvolution.

      We can however not exclude that a hyper parametrization of each tool could have helped to improve their current performances. Also, for RNA-Seq data deconvolution, some of the methods followed specific features filtering, e.g., the quanTIseq framework removes a manually curated list of noisy genes as well as aberrant immune genes identified in the TCGA data and ABIS uses immune-specific housekeeping genes. We can hypothesize that additional filtering could be explored for the ATAC-Seq deconvolution to improve the performance of the tools.

      We have clarified these points in the results section when introducing the benchmarking, in the methods and in the discussion section.

      (3) On a similar note, it could be made more explicit which adaptations were introduced in EPIC, besides the ad-hoc ATAC-seq signature, to make it applicable to this type of data.

      In the first version of the manuscript, we described the changes brought to EPIC to perform bulk ATAC-Seq deconvolution in the Material and methods section in the paragraph “Running EPIC-ATAC on bulk ATAC-Seq data”.  We have moved and completed this paragraph in the results section before the description of the evaluation of EPIC-ATAC in different datasets. The paragraph is the following:

      “EPIC-ATAC integrates the marker peaks and profiles into EPIC to perform bulk ATAC-Seq data deconvolution

      The cell-type specific marker peaks and profiles derived from the reference samples were integrated into the EPIC deconvolution tool (Racle et al., 2017; Racle and Gfeller, 2020). We will refer to this ATAC-Seq deconvolution framework as EPIC-ATAC. To ensure the compatibility of any input bulk ATAC-Seq dataset with the EPIC-ATAC marker peaks and reference profiles, we provide an option to lift over hg19 datasets to hg38 (using the liftOver R package) as the reference profiles are based on the hg38 reference genome. Subsequently, the features of the input bulk matrix are matched to our reference profiles’ features. To match both sets of features, we determine for each peak of the input bulk matrix the distance to the nearest peak in the reference profiles peaks. Overlapping regions are retained and the feature IDs are matched to their associated nearest peaks. If multiple features are matched to the same reference peak, the counts are summed. Before the estimation of the cell-type proportions, we transform the data following an approach similar to the transcripts per million (TPM) transformation which has been shown to be appropriate to estimate cell fractions from bulk mixtures in RNA-Seq data (Racle et al., 2017; Sturm et al., 2019). We normalize the ATAC-Seq counts by dividing counts by the peak lengths as well as samples depth and rescaling counts so that the counts of each sample sum to 106. In RNA-Seq based deconvolution, EPIC uses an estimation of the amount of mRNA in each reference cell type to derive cell proportions while correcting for cell-type-specific mRNA bias. For the ATAC-Seq based deconvolution these values were set to 1 to give similar weights to all cell-types quantifications. Indeed ATAC-Seq measures signal at the DNA level, hence the quantity of DNA within each reference cell type is similar.”

      (4) Given that the final applicability of EPIC-ATAC is on real bulk RNA-seq data, whose characteristics might not be completely recapitulated by pseudo-bulk samples, it would be interesting to see EPIC and EPIC-ATAC compared on a dataset with matched, real bulk RNA-seq and ATAC-seq, respectively. It would nicely complement the analysis of Figure 7 and could be used to dissect the commonalities and peculiarities of these two approaches.

      We thank the reviewer for raising this important point. EPIC-ATAC will be applied to real bulk ATAC-Seq data and pseudobulk data cannot indeed fully recapitulate the bulk signals.  Recently, a dataset composed of more than 100 samples with matched bulk RNA-Seq, bulk ATAC-Seq as well as matched flow cytometry data has been published by Morandini and colleagues in GeroScience in November 2023. We thus retrieved these data to compare the predictions obtained by EPIC-ATAC on the bulk ATAC-Seq data and the predictions of the original version of EPIC on the bulk RNA-Seq data to the cell-type quantification obtained by flow cytometry. We also assessed whether both modalities could be complementary using a simple approach averaging the predictions obtained from both modalities. The results of these analyzes have been summarized in the Figure 7C and are described in the main text in the last paragraph of the paper:

      “We compared the predictions obtained using each modality to the flow cytometry cell-type quantifications. EPIC-ATAC predictions were better correlated with the flow cytometry measures for some cell types (e.g., CD8+, CD4+ T cells, NK cells) while this trend was observed with the EPIC-RNA predictions in other cell types (B cells, neutrophils, monocytes) (Figure 7C). We then tested whether the predictions obtained from both modalities could be combined to improve the accuracy of each cell-type quantification. Averaging the predictions obtained from both modalities shows a moderate improvement (Figure 7C), suggesting that the two modalities can complement each other.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript expands the current bulk sequencing data deconvolution toolkit to include ATAC-seq. The EPIC-ATAC tool successfully predicts accurate proportions of immune cells in bulk tumour samples and EPIC-ATAC seems to perform well in benchmarking analyses. The authors achieve their aim of developing a new bulk ATAC-seq deconvolution tool.

      Strengths:

      The manuscript describes simple and understandable experiments to demonstrate the accuracy of EPIC-ATAC. They have also been incredibly thorough with their reference dataset collections. The authors have been robust in their benchmarking endeavours and measured EPIC-ATAC against multiple datasets and tools.

      Weaknesses:

      Currently, the tool has a narrow applicability in that it estimates the percentage of immune cells in a bulk ATAC-seq experiment.

      Comments:

      (1) Has any benchmarking been done on the runtime of the tool? Although EPIC-ATAC seems to "win" in benchmarking metrics, sometimes the differences are quite small. If EPIC-ATAC takes forever to run, compared to another tool that is a lot quicker, might some people prefer to sacrifice 0.01 in correlation for a quicker running tool?

      We thank the reviewer for raising this point that was not addressed in the manuscript. We have added a supplementary figure (Supplementary Figure 8) which represents the CPU time used by each tool. The figure shows that all the tools could be run in less than 20 seconds in average. This figure has been mentioned at the end of the benchmarking paragraphs.

      (2) In Figure 3B the data points look a bit squashed in the bottom-left corner. Could the plot be replotted with the data point spread out? There also seems to be some inter-patient variability. Could the authors comment on that?

      We have updated Figure 3B to increase the visibility of the dots in the bottom-left corner. To do so, we have limited the x and y axes to the maximum of the predicted proportions for the y axis and true proportions for the x axis.

      We also acknowledge that the accuracy of the predictions varies across samples. In particular, one sample (Sample4, star shape on Figure 3B) exhibits larger discrepancies between EPIC-ATAC predictions and the ground truth. To understand the lower performance, we have visualized our marker peaks in the five PBMC samples (Figure below). Based on this visualization, we can see that Sample4 might be an outlier sample considering that its cellular composition is similar to that of Sample2 and Sample5, however this sample shows particularly high ATAC-Seq accessibility at the monocytes and dendritic markers. This can explain why EPIC-ATAC overestimates the proportions of the two populations in this case. We have added the previously mentioned figures as a Supplementary Figure (Supplementary Figure 2) and have described it in the results section in the paragraph “EPIC-ATAC accurately estimates immune cell fractions in PBMC ATAC-Seq samples”.

      (3) Could the authors comment on the possibility of expanding EPIC-ATAC into more than a percentage prediction tool? Perhaps EPIC-ATAC could remove the immune cell signal from the bulk ATAC-seq data to "purify" the uncharacterised cells in silico, or generate pseudo-ATAC-seq tracks of the identified cell types.

      We thank the reviewer for this interesting question. As suggested by the reviewer, one approach to purify bulk genomics data using the cell-type proportions estimated by a cell-type deconvolution tool is to subtract the weighted sum of the signal observed in the reference data, weights corresponding to the predicted proportions. We used this approach on the EPIC-ATAC predictions obtained from pseudobulks built from scATAC-Seq data from diverse cancer types coming from the Human Tumor Atlas Network (HTAN) (See also the answer of the first recommendation of Reviewer 1). This dataset allows us to compare for a relatively large number of samples (a maximum of 25  samples in a cancer type cohort) the purified signal to the true signal derived from the single-cell data. The results are presented in the figure below which shows that the correlations between the predicted and true signals are relatively good in most of the cancer types (blue boxplots). However, these correlation levels are lower than the ones obtained when comparing the signal obtained from the entire pseudobulk (red boxplots) with the true signal. This suggests that this purification approach leads to a signal that is less precise and accurate than the signal resulting from all cells mixtures.

      Author response image 1.

      Boxplots of the correlation values obtained from the comparison of the bulk signal and the ground truth signal from the uncharacterized cells in each sample (red) and from the comparison of the predicted signal and the ground truth signal from the uncharacterized cells in each sample (blue).

      Also, note that in our simple approach, negative values can be obtained. The predicted signal will thus be difficult to interpret and to use in downstream analyses. Methods claiming to perform purification of bulk samples use more complex and dedicated algorithms. For example, Symphony (Burdziak et al., 2019) (cited in our introduction) uses single-cell RNA-Seq data in addition to the bulk chromatin accessibility data to infer cluster-specific accessibility profiles. Considering that EPIC was not designed for purification purposes, we decided not to include this analysis in the updated version of the manuscript.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The original EPIC had two different signatures for application to blood or tumor RNA-seq. It is not clear instead if EPIC-ATAC applies with the same signature and framework to any tissue and disease context. This aspect should be clarified in the text.

      We thank the reviewer for raising this point which was not clear in the previous version of the manuscript. As in the original version of EPIC, in EPIC-ATAC two reference profiles and sets of markers are available, the PBMC reference and the TME reference. We used the PBMC reference profiles and markers to deconvolve the PBMC samples and the TME reference profiles and markers to deconvolve the cancer samples. We have clarified this point in the result section of the main text in the paragraph “ATAC-Seq data from sorted cell populations reveal cell-type specific marker peaks and reference profiles” as follows (added text underlined):

      “The resulting marker peaks specific only to the immune cell types were considered for the deconvolution of PBMC samples (PBMC markers). For the deconvolution of tumor bulk samples, the lists of marker peaks specific to fibroblasts and endothelial cells were added to the PBMC markers. This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas (TCGA) (Corces et al., 2018), i.e., markers exhibiting the highest correlation patterns in the tumor bulk samples were selected using the findCorrelation function from the caret R package (Kuhn, 2008) (Figure 1, box 4, see the Material and methods, section 2). The latter filtering ensures the relevance of the markers in the TME context since cell-type specific TME markers are expected to be correlated in tumor bulk ATAC-Seq measurements (Qiu et al., 2021). 716 markers of immune, fibroblasts and endothelial cell types remained after the last filtering (defined as TME markers). Considering the difference in cell types and the different filtering steps applied on the PBMC and TME markers, we recommend to use the TME markers and profiles to deconvolve bulk samples from tumor samples and the PBMC markers and profiles to deconvolve PBMC samples.”

      We also note that when running EPIC-ATAC using the PBMC markers and the TME markers independently to perform the deconvolution of the cancer datasets, we see that overall the use of the TME markers leads to a better performance (Figure below).

      Figure legend: Correlation and RMSE values obtained when running EPIC-ATAC on each cancer dataset (points) using the PBMC (red) and the TME (blue) markers.

      To demonstrate that the TME markers can be applied to different cancer types, we have completed the evaluation of EPIC-ATAC on tumor samples by considering an additional dataset: the Human Tumor Atlas Network (HTAN) single-cell multiomic (scRNA-Seq and scATAC-Seq) dataset. We have processed this dataset and built scATAC-Seq pseudobulks for 7 cancer types on which EPIC-ATAC was applied to. This analysis has been summarized in Figure 4 and Supplementary Figure 4 and shows that EPIC-ATAC is applicable in a diverse set of tissues.

      (2) EPIC and EPIC-ATAC have a valuable feature, which is absent from most deconvolution methods: the estimation of unknown content. It would be informative for the users to understand from the benchmarking analysis whether this feature gives an advantage to EPIC-ATAC with respect to the other approaches.

      Indeed, among the tools that we included in our benchmarking analysis, only EPIC-ATAC and quanTIseq enable users to predict the proportions of cells that are not present in the reference profiles, i.e., the uncharacterized cells. For the other tools we thus fixed the estimated proportions of uncharacterized cells to 0. This approach provides a clear and significant advantage to EPIC-ATAC and to quanTIseq. For this reason, we also provide a version of the benchmarking in which we exclude the uncharacterized cells and rescale the true and estimated cell-type proportions to sum to 1. In this second benchmarking approach, EPIC-ATAC still outperforms some of the other deconvolution tools.

      We have clarified this point in the results section, in the paragraph “EPIC-ATAC accurately predicts fractions of cancer and non-malignant cells in tumor samples”.

      (3) The selection of the most discriminative markers is very well described in the text and beautifully illustrated in Figure 2. However, it is unclear why UMAP plots are used to represent cell-type similarities and dissimilarities. Would a linear dimensionality reduction approach like PCA be already sufficient to show these groups, especially considering the not-so-extreme dimensionality of the underlying data? In addition, a statistic that could be also considered to compare clusters to the cell type labels in the two scenarios is the Adjusted Rand Index (ARI).

      We thank the reviewer for this relevant comment. We initially used UMAP to facilitate the visualization of the different cell-type groups. However, it is true that the three first axes of the principal component analyses performed based on each set of marker peaks already capture most of the structure in the data and that the use of UMAP can lead to an artificial enhancement of separation between the different groups of cells. We have updated Figure 2B by replacing the UMAP scatter plots by 3D representations of the first three principal components of the PCA and have added in Supplementary Figure 1B the pairwise scatter plots of these first 3 principal components. On the main figures, we have also added the ARI metric comparing the cell-type annotation and the clustering obtained using the first 10 axes of the PCA and model based clustering.

      (4) In the introduction, it is stated that "the reasonable cost and technical advantages of these protocols foreshadow an increased usage of ATAC-Seq in cancer studies". I would suggest adding a reference to justify this trend. Also, it should be discussed how ATAC-seq deconvolution compares to other types of deconvolution approaches applied to cheaper epigenetic data like methylation one (e.g. epidish, methylcc, tca, minfi).

      We have complemented this sentence with two references to justify the assertion: (i) a review published by Luo, Gribskov and Wang in 2022 showing the increasing number of ATAC-Seq studies in the field of cancer research, and (ii) a protocol paper from Grandi et al. published in 2022 on the state-of-the-art Omni-ATAC protocol for ATAC-sequencing which discusses the broad applicability and the technical advantages of ATAC-sequencing. Also in the preceding sentence, a recent ATAC-Seq protocol that can be applied to FFPE samples has been mentioned, FFPE samples being the most common samples in clinical cancer research.

      We agree with the reviewer on the fact that other epigenetic assays such as methylation assays are cost effective. However, ATAC-sequencing provides additional information on the epigenetic landscape of a sample’s genome and some questions regarding regulatory regions and transcription factor activity cannot be answered with methylation data. Methods that can be applied on ATAC-Seq data specifically are thus needed. Most of the cell-type deconvolution algorithms existing so far are applicable on RNA-Seq or methylation data. These algorithms often use similar methodological concepts, e.g., linear combination of the reference profiles for reference-based methods, which could be used in different modalities. However, methylation-based deconvolution tools often take as input a data format that is specific to methylation data, e.g., two color micro array data (RGChannelSet R object) for the minfi deconvolution function (estimatesCellCounts) or leverage methylation-specific information to perform the deconvolution. For example, methylCC uses a model based on latent variables representing a binarized measures of the methylation status of cell-type specific regions (1 or 0 for clearly methylated or unmethylated regions). Such methods are more difficult to adapt than tools  based on RNA-Seq data where the signal is quantified using read counts similarly to ATAC-Seq data.

      Nevertheless, some methods such as EPIdish or MethylCIBERSORT have proposed new methylation reference profiles and have used existing models that are not specific to methylation data to deconvolve the bulk data. In our work, we followed a similar approach where we propose new reference profiles specific to chromatin accessibility data, integrate them to an existing method EPIC as well as test them in other existing tools. Note that methylation reference profiles cannot be directly used for ATAC-Seq data deconvolution considering that methylation measures methylation status at CpG sites (dinucleotides) and ATAC-Seq measures the accessibility of regions of hundreds base pairs.

      An analysis comparing the performance of methylation-based deconvolution and ATAC-Seq based deconvolution would be informative. However, such analysis is beyond the scope of our paper considering that none of the datasets used for our benchmarking provide these two modalities for the same samples.

      In the manuscript, we have completed the references associated to the methylation-based deconvolution tools with the ones mentioned in the previous paragraphs and by the reviewer and have completed the discussion as follows:

      “The comparison of EPIC-ATAC applied on ATAC-Seq data with EPIC applied on RNA-Seq data has shown that both modalities led to similar performances and that they could complement each other. Another modality that has been frequently used in the context of bulk sample deconvolution is methylation. Methylation profiling techniques such as methylation arrays are cost effective (Kaur et al., 2023) and DNA methylation signal is highly cell-type specific (Kaur et al., 2023; Loyfer et al., 2023). Considering that methylation and chromatin accessibility measure different features of the epigenome, additional analyses comparing and/or complementing ATAC-seq based deconvolution with methylation-based deconvolution could be of interest as future datasets profiling both modalities in the same samples become available.”

      (5) In the Results section, some methodological steps could be phrased in a bit more extensive way to let the reader understand the rationale and the actual approach. I recognize there is also a reference to the Methods section, where all methodologies are reported in detail, but some of the sentences are hard to understand due to their synthetic format, e.g.: "markers with potential residual accessibility in human tissues were then filtered out".

      We thank the reviewer for this comment and we have followed his recommendation to expand sentences with a synthetic format. Text changes and additions are underlined below:

      “To limit batch effects, the collected samples were homogeneously processed from read alignment to peak calling. For each cell type, we derived a set of stable peaks observed across samples and studies, i.e. for each study, peaks detected in at least half of the samples were considered, and for each cell type, only peaks detected jointly in all studies were kept (see Materials and Methods, section 1).”

      “To filter out markers that could be accessible in other human cell-types than those included in our reference profiles, we used the human atlas study (K. Zhang et al., 2021), which identified modules of open chromatin regions accessible in a comprehensive set of human tissues, and we excluded from our marker list the markers overlapping these modules (Figure 1, box 3, see Materials and Methods section 2).”

      “For the deconvolution of tumor bulk samples, the lists of marker peaks specific to fibroblasts and endothelial cells were added to the PBMC markers. This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas (TCGA) (Corces et al., 2018), i.e., markers exhibiting the highest correlation patterns in the tumor bulk samples were selected using the findCorrelation function from the caret R package (Kuhn, 2008)  (Figure 1, box 4, see the Material and methods, section 2).”

      Also, following the comments and recommendations of the Reviewer 1, we have: (i) moved the method section describing the adaptation of EPIC to ATACseq data to provide more details in the results section (see answer to the third comment of Reviewer 1), (ii) clarified how the existing tools used in the benchmarking analyses were adapted for ATAC-Seq deconvolution (see answer to the second comment of Reviewer 1), and (iii) detailed how the comparison between our estimations of the infiltration levels in the samples from Kumegawa et al. and the estimations from the original study was performed (see answer to the seventh recommendation of Reviewer 1).

      (6) In the main text, it is stated that "the list of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from diverse cancer types from The Cancer Genome Atlas". It should be clarified if these are only solid cancers, or if blood cancers were also used.

      We have considered only the solid cancers and have clarified this point in the results section: “This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas”.

      (7) When reporting that "these predictions are consistent with the infiltration level estimations reported in the original publication", it should be mentioned how the infiltration levels were quantified in this publication and how this agreement was quantified. This would be important also to claim in the abstract that "EPIC-ATAC accurately infers the immune contexture of the main breast cancer subtypes".

      We thank the reviewer for this comment, we acknowledge that the agreement between the EPIC-ATAC predictions and the infiltration levels quantified in the original publication should be further described in the paper. We have expanded the text in the results section in the paragraph “EPIC-ATAC accurately infers the immune contexture in a bulk ATAC-Seq breast cancer cohort” to clarify this point. Additionally, we have added a panel in Figure 6 (panel A) which shows a good agreement between EPIC-ATAC predictions and the metric used in the original paper to evaluate the infiltration levels of different cell types.

      The added text is underlined below:

      “We applied EPIC-ATAC to a breast cancer cohort of 42 breast ATAC-Seq samples including samples from two breast cancer subtypes, i.e., 35 oestrogen receptor (ER)-positive human epidermal growth factor receptor 2 (HER2)-negative (ER+/HER2-) samples and 7 triple negative (TNBC) tumors (Kumegawa et al., 2023). No cell sorting was performed in parallel to the chromatin accessibility sequencing. For this reason, the authors used a set of cell-type-specific cis-regulatory elements (CREs) identified in scATAC-Seq data from similar breast cancer samples (Kumegawa et al., 2022) and estimated the amount of infiltration of each cell type by averaging the ATAC-Seq signal of each set of cell-type-specific CREs in their samples. We used EPIC-ATAC to estimate the proportions of different cell types of the TME. These predictions were then compared to the metric used by Kumegawa and colleagues in their study to infer levels of infiltration. A high correlation between the two metrics was observed for each cell type (Pearson’s correlation coefficient from 0.5 for myeloid cells to 0.94 for T cells, Figure 6A).”  

      (8) It should be made explicit if EPIC-ATAC quantifies mDC, pDC, or their sum.

      In our collection of reference ATAC-Seq samples from which the markers and profiles have been derived, mDCs and pDCs were both included in the dendritic cells.  EPIC-ATAC thus quantifies the total amount of dendritic cells, i.e., mDCs and pDCs included. We have added a sentence in the main text to clarify this point:

      To identify robust chromatin accessibility marker peaks of cancer relevant cell types, we collected 564 samples of sorted cell populations from twelve studies including eight immune cell types (B cells […] dendritic cells (DCs) (mDCs and pDCs are grouped in this cell-type category) […] and  endothelial (Liu et al., 2020; Xin et al., 2020) cells (Figure 1 box 1, Figure 2A, Supplementary Table 1).

      Reviewer #2 (Recommendations For The Authors):

      The authors should double-check the naming of tools is done correctly e.g. ChIPSeeker has been spelled incorrectly in some instances throughout the manuscript.

      We thank the reviewer for pointing out this mistake and have corrected the mistake in the main text.

    1. Author response:

      We thank the editor and reviewers for the time they spent reviewing our manuscript entitled ‘Overnight fasting facilitates safety learning by changing the neurophysiological response to relief from threat omission’ which was sent as an original paper for a potential publication in eLife.

      Since we take the reviewer comments at heart and recognize the very complex scenario of our previous and current results we will take more time to re-think the paper. This time will serve us to look back to the interpretation of the results of our previous behavioral study, to the preregistration plan as well as findings of our current fMRI (replication) study.

      We aim to address the fundamental issues indicated by the reviewers as soon and as clearly as possible.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript considers a mechanistic extension of MacArthur's consumer-resource model to include chasing down food and potential encounters between the chasers (consumers) that lead to less efficient feeding in the form of negative feedback. After developing the model, a deterministic solution and two forms of stochastic solutions are presented, in agreement with each other. Finally, the model is applied to explain observed coexistence and rank-abundance data.

      We thank the reviewer for the accurate summary of our manuscript.

      Strengths:

      The application of the theory to natural rank-abundance curves is impressive. The comparison with the experiments that reject the competitive exclusion principle is promising. It would be fascinating to see if in, e.g. insects, the specific interference dynamics could be observed and quantified and whether they would agree with the model.

      The results are clearly presented; the methods adequately described; the supplement is rich with details.

      There is much scope to build upon this expansion of the theory of consumer-resource models. This work can open up new avenues of research.

      We appreciate the reviewer for the very positive comments. We have followed many of the suggestions raised by the reviewer, and the manuscript is much improved as a result.

      Following the reviewer’s suggestions, we have now used Shannon entropies to quantify the model comparison with experiments that reject the Competitive Exclusion Principle (CEP). Specifically, for each time point of each experimental or model-simulated community, we calculated the Shannon entropies using the formula:

      , where is the probability that a consumer individual belongs to species C<sub>i</sub> at the time stamp of t. The comparison of Shannon entropies in the time series between those of the experimental data and SSA results shown in Fig. 2D-E is presented in Appendix-fig. 7C-D. The time averages and standard deviations (δH) of the Shannon entropies for these experimental or SSA model-simulated communities are as follows:

      , ; ,

      , , .

      Meanwhile, we have calculated the time averages and standard deviations (δC<sub>i</sub>) of the species’ relative/absolute abundances for the experimental or SSA model-simulated communities shown in Fig. 2D-E, which are as follows:

      , ; , ; , , , , where the superscript “(R)” represents relative abundances.

      From the results of Shannon entropies shown in Author response image 1 (which are identical to those of Appendix-fig. 7C-D) and the quantitative comparison of the time average and standard deviation between the model and experiments presented above, it is evident that the model results in Fig. 2D-E exhibit good consistency with the experimental data. They share roughly identical time averages and standard deviations in both Shannon entropies and the species' relative/absolute abundances for most of the comparisons. All these analyses are included in the appendices and mentioned in the main text.

      Author response image 1.

      Shannon Entropies of the experimental data and SSA results in Fig. 2D-E, redrawn from Appendix-fig. 7C-D.

      Weaknesses:

      I am questioning the use of carrying capacity (Eq. 4) instead of using nutrient limitation directly through Monod consumption (e.g. Posfai et al. who the authors cite). I am curious to see how these results hold or are changed when Monod consumption is used.

      We thank the reviewer for raising this question. To explain it more clearly, the equation combining the third equation in Eq. 1 and Eq. 4 of our manuscript is presented below as Eq. R1:

      where x<sub>il</sub> represents the population abundance of the chasing pair C<sub>i</sub><sup>(P)</sup> ∨ R<sub>l</sub><sup>(P)</sup>, κ<sub>l</sub> stands for the steady-state population abundance of species R<sub>l</sub> (the carrying capacity) in the absence of consumer species. In the case with no consumer species, then x<sub>il</sub> \= 0 since C<sub>i</sub> \= 0 (i\=1,…,S<sub>C</sub>), thus R<sub>l</sub> = κ<sub>l</sub> when R<sub>l</sub> = 0.

      Eq. R1 for the case of abiotic resources is comparable to Eq. (1) in Posfai et al., which we present below as Eq. R2:

      where c<sub>i</sub> represents the concentration of nutrient i, and thus corresponds to our R<sub>l</sub> ; n<sub>σ</sub>(t) is the population of species σ, which corresponds to our C<sub>i</sub> ; s<sub>i</sub> stands for the nutrient supply rate, which corresponds to our ζl ; µi denotes the nutrient loss rate, corresponding to our is the coefficient of the rate of species σ for consuming nutrient i, which corresponds to our in Posfai et al. is the consumption rate of nutrient i by the population of species σ, which corresponds to our x<sub>il</sub>.

      In Posfai et al., is the Monod function: and thus

      In our model, however, since predator interference is not involved in Posfai et al., we need to analyze the form of x<sub>il</sub> presented in the functional form of x<sub>il</sub> ({R<sub>l</sub>},{C<sub>i</sub>}) in the case involving only chasing pairs. Specifically, for the case of abiotic resources, the population dynamics can be described by Eq. 1 combined with Eq. R1:

      where and . For convenience, we consider the case of S<sub>R</sub> \=1 where the Monod form was derived (Monod, J. (1949). Annu. Rev. Microbiol., 3, 371-394.). From , we have

      where , and l =1. If the population abundance of the resource species is much larger than that of all consumer species (i.e., ), then,

      and R<sub>l</sub><sup>(F)</sup> ≈ R<sub>l</sub>. Combined with R5, and noting that C<sub>i</sub> \= C<sub>i</sub>(F) + xil we can solve for x<sub>il</sub> :

      with l =1 since S<sub>R</sub> \=1. Comparing Eq. R6 with Eq. R3, and considering the symbol correspondence explained in the text above, it is now clear that our model can be reduced to the Monod consumption form in the case of S<sub>R</sub> \=1 where the Monod form was derived from.

      Following on the previous comment, I am confused by the fact that the nutrient consumption term in Eq. 1 and how growth is modeled (Eq. 4) are not obviously compatible and would be hard to match directly to experimentally accessible quantities such as yield (nutrient to biomass conversion ratio). Ultimately, there is a conservation of mass ("flux balance"), and therefore the dynamics must obey it. I don't quite see how conservation of mass is imposed in this work.

      We thank the reviewer for raising this question. Indeed, the population dynamics of our model must adhere to flux balance, with the most pertinent equation restated here as Eq. R7:

      Below is the explanation of how Eq. R7, and thus Eqs. 1 and 4 of our manuscript, adhere to the constraint of flux balance. The interactions and fluxes between consumer and resource species occur solely through chasing pairs. At the population level, the scenario of chasing pairs among consumer species C<sub>i</sub> and resource species R<sub>l</sub> is presented in the follow expression:

      where the superscripts "(F)" and "(P)" represent the freely wandering individuals and those involved in chasing pairs, respectively, "(+)" stands for the gaining biomass of consumer C<sub>i</sub> from resource R<sub>l</sub>. In our manuscript, we use x<sub>l</sub> to represent the population abundance (or equivalently, the concentration, for a well-mixed system with a given size) of the chasing pair C<sub>i</sub><sup>(P)</sup> ∨ R<sub>l</sub><sup>(P)</sup>, and thus, the net flow from resource species R<sub>l</sub> to consumer species C<sub>i</sub> per unit time is k<sub>il</sub>x<sub>il</sub>. Noting that there is only one R<sub>l</sub> individual within the chasing pair C<sub>i</sub><sup>(P)</sup> ∨ R<sub>l</sub><sup>(P)</sup>, then the net effect on the population dynamics of species is −k<sub>il</sub>x<sub>il</sub>. However, since a consumer individual from species C<sub>i</sub> could be much heavier than a species R<sub>l</sub> individual, and energy dissipation would be involved from nutrient conversion into biomass, we introduce a mass conversion ratio w<sub>l</sub> in our manuscript. For example, if a species C<sub>i</sub> individual is ten times the weight of a species R<sub>l</sub> individual, without energy dissipation, the mass conversion ratio wil should be 1/10 (i.e., wil \= 0.1 ), however, if half of the chemical energy is dissipated into heat from nutrient conversion into biomass, then w<sub>l</sub> \= 0.1 0.5× = 0.05. Consequently, the net effect of the flux from resource species _R_l to consumer species C<sub>i</sub> per unit time on the population dynamics is , and flux balance is clearly satisfied.

      For the population dynamics of a consumer species C<sub>i</sub>, we need to consider all the biomass influx from different resource species, and thus there is a summation over all species of resources, which leads to the term of in Eq. R7. Similarly, for the population dynamics of a resource species R<sub>l</sub>, we need to lump sum all the biomass outflow into different consumer species, resulting in the term of in Eq. R7.

      Consequently, Eq. R7 and our model satisfy the constraint of flux balance.

      These models could be better constrained by more data, in principle, thereby potential exists for a more compelling case of the relevance of this interference mechanism to natural systems.

      We thank the reviewer for raising this question. Indeed, our model could benefit from the inclusion of more experimental data. In our manuscript, we primarily set the parameters by estimating their reasonable range. Following the reviewer's suggestions, we have now specified the data we used to set the parameters. For example, in Fig. 2D, we set 𝐷<sub>2</sub>\=0.01 with τ=0.4 days, resulting in an expected lifespan of Drosophila serrata in our model setting of 𝜏⁄𝐷<sub>2</sub>\= 40 days, which roughly agrees with experimental data showing that the average lifespan of D. serrata is 34 days for males and 54 days for females (lines 321-325 in the appendices; reference: Narayan et al. J Evol Biol. 35: 657–663 (2022)). To explain biodiversity and quantitatively illustrate the rank-abundance curves across diverse communities, the competitive differences across consumer species, exemplified by the coefficient of variation of the mortality rates - a key parameter influencing the rank-abundance curve, were estimated from experimental data in the reference article (Patricia Menon et al., Water Research (2003) 37, 4151) using the two-sigma rule (lines 344-347 in the appendices).

      Still, we admit that many factors other than intraspecific interference, such as temporal variation, spatial heterogeneity, etc., are involved in breaking the limits of CEP in natural systems, and it is still challenging to differentiate each contribution in wild systems. However, for the two classical experiments that break CEP (Francisco Ayala, 1969; Thomas Park, 1954), intraspecific interference could probably be the most relevant mechanism, since factors such as temporal variation, spatial heterogeneity, cross-feeding, and metabolic tradeoffs are not involved in those two experimental systems.

      The underlying frameworks, B-D and MacArthur are not properly exposed in the introduction, and as a result, it is not obvious what is the specific contribution in this work as opposed to existing literature. One needs to dig into the literature a bit for that.

      The specific contribution exists, but it might be more clearly separated and better explained. In the process, the introduction could be expanded a bit to make the paper more accessible, by reviewing key features from the literature that are used in this manuscript.

      We thank the reviewer for these very insightful suggestions. Following these suggestions, we have now added a new paragraph and revised the introduction part of our manuscript (lines 51-67 in the main text) to address the relevant issues. Our paper is much improved as a result.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kang et al investigates how the consideration of pairwise encounters (consumer-resource chasing, intraspecific consumer pair, and interspecific consumer pair) influences the community assembly results. To explore this, they presented a new model that considers pairwise encounters and intraspecific interference among consumer individuals, which is an extension of the classical Beddington-DeAngelis (BD) phenomenological model, incorporating detailed considerations of pairwise encounters and intraspecific interference among consumer individuals. Later, they connected with several experimental datasets.

      Strengths:

      They found that the negative feedback loop created by the intraspecific interference allows a diverse range of consumer species to coexist with only one or a few types of resources. Additionally, they showed that some patterns of their model agree with experimental data, including time-series trajectories of two small in-lab community experiments and the rank-abundance curves from several natural communities. The presented results here are interesting and present another way to explain how the community overcomes the competitive exclusion principle.

      We appreciate the reviewer for the positive comments and the accurate summary of our manuscript.

      Weaknesses:

      The authors only explore the case with interspecific interference or intraspecific interference exists. I believe they need to systematically investigate the case when both interspecific and intraspecific interference exists. In addition, the text description, figures, and mathematical notations have to be improved to enhance the article's readability. I believe this manuscript can be improved by addressing my comments, which I describe in more detail below.

      We thank the reviewer for these valuable suggestions. We have followed many of the suggestions raised by the reviewer, and the manuscript is much improved as a result.

      (1) In nature, it is really hard for me to believe that only interspecific interference or intraspecific interference exists. I think a hybrid between interspecific interference and intraspecific interference is very likely. What would happen if both the interspecific and intraspecific interference existed at the same time but with different encounter rates? Maybe the authors can systematically explore the hybrid between the two mechanisms by changing their encounter rates. I would appreciate it if the authors could explore this route.

      We thank the reviewer for raising this question. Indeed, interspecific interference and intraspecific interference simultaneously exist in real cases. To differentiate the separate contributions of inter- and intra-specific interference on biodiversity, we considered different scenarios involving inter- or intra-specific interference. In fact, we have also considered the scenario involving both inter- and intra-specific interference in our old version for the case of S<sub>C</sub> = 2 and S<sub>R</sub> = 1, where two consumer species compete for one resource species (Appendix-fig. 5, and lines 147-148, 162-163 in the main text of the old version, or lines 160-161, 175-177 in the new version).

      Following the reviewer’s suggestions, we have now systematically investigated the cases of S<sub>C</sub> = 6, S<sub>R</sub> = 1, and S<sub>C</sub> = 20, S<sub>R</sub> = 1, where six or twenty consumer species compete for one resource species in scenarios involving chasing pairs and both inter- and intra-specific interference using both ordinary differential equations (ODEs) and stochastic simulation algorithm (SSA). These newly added ODE and SSA results are shown in Appendix-fig. 5 F-H, and we have added a new paragraph to describe these results in our manuscript (lines 212-215 in the main text). Consistent with our findings in the case of S<sub>C</sub> = 2 and S<sub>R</sub> = 1, the species coexistence behavior in the cases of both S<sub>C</sub> = 6, S<sub>R</sub> = 1, and S<sub>C</sub> = 20, S<sub>R</sub> = 1 is very similar to those without interspecific interference: all consumer species coexist with one type of resources at constant population densities in the ODE studies, and the SSA results fluctuate around the population dynamics of the ODEs.

      As for the encounter rates of interspecific and intraspecific interference, in fact, in a well-mixed system, these encounter rates can be derived from the mobility rates of the consumer species using the mean field method. For a system with a size of L2, the interspecific encounter rate between consumer species C<sub>i</sub> and C<sub>j</sub> (ij) is please refer to lines 100-102, 293-317 in the main text, and see also Appendix-fig. 1), where r<sup>(I)</sup> is the upper distance for interference, while v<sub>C<sub>i</sub></sub> and v<sub>C<sub>j</sub></sub> represent the mobility rates of species C<sub>i</sub> and C<sub>j</sub>, respectively. Meanwhile, the intraspecific encounter rates within species C<sub>i</sub> and species C<sub>j</sub> are and , respectively.

      Thus, once the intraspecific encounter rates a’<sub>ii</sub> are a’<sub>jj</sub> given, the interspecific encounter rate between species C<sub>i</sub> and C<sub>j</sub> is determined. Consequently, we could not tune the encounter rates of interspecific and intraspecific interference at will in our study, especially noting that for clarity reasons, we have used the mortality rate as the only parameter that varies among the consumer species throughout this study. Alternatively, we have made a systematic study on analyzing the influence of varying the separate rate and escape rate on species coexistence in the case of two consumers competing for a single type of resources (see Appendix-fig. 5A).

      (2) In the first two paragraphs of the introduction, the authors describe the competitive exclusion principle (CEP) and past attempts to overcome the CEP. Moving on from the first two paragraphs to the third paragraph, I think there is a gap that needs to be filled to make the transition smoother and help readers understand the motivations. More specifically, I think the authors need to add one more paragraph dedicated to explaining why predator interference is important, how considering the mechanism of predator interference may help overcome the CEP, and whether predator interference has been investigated or under-investigated in the past. Then building upon the more detailed introduction and movement of predator interference, the authors may briefly introduce the classical B-D phenomenological model and what are the conventional results derived from the classical B-D model as well as how they intend to extend the B-D model to consider the pairwise encounters.

      We thank the reviewer for these very insightful suggestions. Following these suggestions, we have added a new paragraph and revised the introduction part of our paper (lines 51-67 in the main text). Our manuscript is significantly improved as a result.

      (3) The notations for the species abundances are not very informative. I believe some improvements can be made to make them more meaningful. For example, I think using Greek letters for consumers and English letters for resources might improve readability. Some sub-scripts are not necessary. For instance, R^(l)_0 can be simplified to g_l to denote the intrinsic growth rate of resource l. Similarly, K^(l)_0 can be simplified to K_l. Another example is R^(l)_a, which can be simplified to s_l to denote the supply rate. In addition, right now, it is hard to find all definitions across the text. I would suggest adding a separate illustrative box with all mathematical equations and explanations of symbols.

      We thank the reviewer for these very useful suggestions. We have now followed many of the suggestions to improve the readability of our manuscript. Given that we have used many English letters for consumers and there are already many symbols of English and Greek letters for different variables and parameters in the appendices, we have opted to use Greek letters for parameters specific to resource species and English letters for those specific to consumer species. Additionally, we have now added Appendix-tables 1-2 in the appendices (pages 16-17 in the appendices) to illustrate the symbols used throughout our manuscript.

      (4) What is the f_i(R^(F)) on line 131? Does it refer to the growth rate of C_i? I noticed that f_i(R^(F)) is defined in the supplementary information. But please ensure that readers can understand it even without reading the supplementary information. Otherwise, please directly refer to the supplementary information when f_i(R^(F)) occurs for the first time. Similarly, I don't think the readers can understand \Omega^\prime_i and G^\prime_i on lines 135-136.

      We thank the reviewer for raising these questions. We apologize for not illustrating those symbols and functions clearly enough in our previous version of the manuscript. f<sub>i</sub>R<sup>(F)</sup>⟯ is a function of the variable R<sup>(F)</sup> with the index i, which is defined as and for i=2. Following the reviewer’s suggestions, we have now added clear definitions for symbols and functions and resolved these issues. The definitions of \Omega_i, \Omega^\prime_i, G, and G^\prime are overly complex, and hence we directly refer to the Appendices when they occur for the first time in the main text.

      Reviewer #3 (Public Review):

      Summary:

      A central question in ecology is: Why are there so many species? This question gained heightened interest after the development of influential models in theoretical ecology in the 1960s, demonstrating that under certain conditions, two consumer species cannot coexist on the same resource. Since then, several mechanisms have been shown to be capable of breaking the competitive exclusion principle (although, we still lack a general understanding of the relative importance of the various mechanisms in promoting biodiversity).

      One mechanism that allows for breaking the competitive exclusion principle is predator interference. The Beddington-DeAngelis is a simple model that accounts for predator interference in the functional response of a predator. The B-D model is based on the idea that when two predators encounter one another, they waste some time engaging with one another which could otherwise be used to search for resources. While the model has been influential in theoretical ecology, it has also been criticized at times for several unusual assumptions, most critically, that predators interfere with each other regardless of whether they are already engaged in another interaction. However, there has been considerable work since then which has sought either to find sets of assumptions that lead to the B-D equation or to derive alternative equations from a more realistic set of assumptions (Ruxton et al. 1992; Cosner et al. 1999; Broom et al. 2010; Geritz and Gyllenberg 2012). This paper represents another attempt to more rigorously derive a model of predator interference by borrowing concepts from chemical reaction kinetics (the approach is similar to previous work: Ruxton et al. 1992). The main point of difference is that the model in the current manuscript allows for 'chasing pairs', where a predator and prey engage with one another to the exclusion of other interactions, a situation Ruxton et al. (1992) do not consider. While the resulting functional response is quite complex, the authors show that under certain conditions, one can get an analytical expression for the functional response of a predator as a function of predator and resource densities. They then go on to show that including intraspecific interference allows for the coexistence of multiple species on one or a few resources, and demonstrate that this result is robust to demographic stochasticity.

      We thank the reviewer for carefully reading our manuscript and for the positive comments on the rigorously derived model of predator interference presented in our paper. We also appreciate the reviewer for providing a thorough introduction to the research background of our study, especially the studies related to the BeddingtonDeAngelis model. We apologize for our oversight in not fully appreciating the related study by Ruxton et al. (1992) at the time of our first submission. Indeed, as suggested by the reviewer, Ruxton et al. (1992) is relevant to our study in that we both borrowed concepts from chemical reaction kinetics. Now, we have reworked the introduction and discussion sections of our manuscript, cited, and acknowledged the contributions of related works, including Ruxton et al. (1992).

      Strengths:

      I appreciate the effort to rigorously derive interaction rates from models of individual behaviors. As currently applied, functional responses (FRs) are estimated by fitting equations to feeding rate data across a range of prey or predator densities. In practice, such experiments are only possible for a limited set of species. This is problematic because whether a particular FR allows stability or coexistence depends on not just its functional form, but also its parameter values. The promise of the approach taken here is that one might be able to derive the functional response parameters of a particular predator species from species traits or more readily measurable behavioral data.

      We appreciate the reviewer's positive comments regarding the rigorous derivation of our model. Indeed, all parameters of our model can be derived from measurable behavioral data for a specific set of predator species.

      Weaknesses:

      The main weakness of this paper is that it devotes the vast majority of its length to demonstrating results that are already widely known in ecology. We have known for some time that predator interference can relax the CEP (e.g., Cantrell, R. S., Cosner, C., & Ruan, S. 2004).

      While the model presented in this paper differs from the functional form of the B-D in some cases, it would be difficult to formulate a model that includes intraspecific interference (that increases with predator density) that does not allow for coexistence under some parameter range. Thus, I find it strange that most of the main text of the paper deals with demonstrating that predator interference allows for coexistence, given that this result is already well known. A more useful contribution would focus on the extent to which the dynamics of this model differ from those of the B-D model.

      We appreciate the reviewer for raising this question and apologize for not sufficiently clarifying the contribution of our manuscript in the context of existing knowledge upon our initial submission. We have now significantly revised the introduction part of our manuscript (lines 51-67 in the main text) to make this clearer. Indeed, with the application of the Beddington-DeAngelis (B-D) model, several studies (e.g., Cantrell, R. S., Cosner, C., & Ruan, S. 2004) have already shown that intraspecific interference promotes species coexistence, and it is certain that the mechanism of intraspecific interference could lead to species coexistence if modeled correctly. However, while we acknowledge that the B-D model is a brilliant phenomenological model of intraspecific interference, for the specific research topic of our manuscript on breaking the CEP and explaining the paradox of the plankton, it is highly questionable regarding the validity of applying the B-D model to obtain compelling results.

      Specifically, the functional response in the B-D model of intraspecific interference can be formally derived from the scenario involving only chasing pairs without consideration of pairwise encounters between consumer individuals (Eq. S8 in Appendices; related references: Gert Huisman, Rob J De Boer, J. Theor. Biol. 185, 389 (1997) and Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)). Since we have demonstrated that the scenario involving only chasing pairs is under the constraint of CEP (see lines 139-144 in the main text and Appendix-fig. 3A-C; related references: Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), and given the identical functional response mentioned above, it is thus highly questionable regarding the validity of the studies relying on the B-D model to break CEP or explain the paradox of the plankton.

      Consequently, one of the major objectives of our manuscript is to resolve whether the mechanism of intraspecific interference can truly break CEP and explain the paradox of the plankton in a rigorous manner. By modeling intraspecific predator interference from a mechanistic perspective and applying rigorous mathematical analysis and numerical simulations, our work resolves these issues and demonstrates that intraspecific interference enables a wide range of consumer species to coexist with only one or a handful of resource species. This naturally breaks CEP, explains the paradox of plankton, and quantitatively illustrates a broad spectrum of experimental results.

      For intuitive understanding, we introduced a functional response in our model (presented as Eq. 5 in the main text), which indeed involves approximations. However, to rigorously break the CEP or explain the paradox of plankton, all simulation results in our study were directly derived from equations 1 to 4 (main text), without relying on the approximate functional response presented in Eq. 5.

      The formulation of chasing-pair engagements assumes that prey being chased by a predator are unavailable to other predators. For one, this seems inconsistent with the ecology of most predator-prey systems. In the system in which I work (coral reef fishes), prey under attack by one predator are much more likely to be attacked by other predators (whether it be a predator of the same species or otherwise). I find it challenging to think of a mechanism that would give rise to chased prey being unavailable to other predators. The authors also critique the B-D model: "However, the functional response of the B-D model involving intraspecific interference can be formally derived from the scenario involving only chasing pairs without predator interference (Wang and Liu, 2020; Huisman and De Boer, 1997) (see Eqs. S8 and S24). Therefore, the validity of applying the B-D model to break the CEP is questionable.".

      We appreciate the reviewer for raising this question. We fully agree with the reviewer that in many predator-prey systems (e.g., coral reef fishes as mentioned by the reviewer, wolves, and even microbial species such as Myxococcus xanthus; related references: Berleman et al., FEMS Microbiol. Rev. 33, 942-957 (2009)), prey under attack by one predator can be targeted by another predator (which we term as a chasing triplet) or even by additional predator individuals (which we define as higher-order terms). However, since we have already demonstrated in a previous study (Xin Wang, Yang-Yu Liu, iScience 23, 101009 (2020)) from a mechanistic perspective that a scenario involving chasing triplets or higher-order terms can naturally break the CEP, while our manuscript focuses on whether pairwise encounters between individuals can break the CEP and explain the paradox of plankton, we deliberately excluded confounding factors that are already known to promote biodiversity, just as we excluded prevalent factors such as cross-feeding and temporal variations in our model.

      However, the way "chasing pairs" are formulated does result in predator interference because a predator attacking prey interferes with the ability of other predators to encounter the prey. I don't follow the author's logic that B-D isn't a valid explanation for coexistence because a model incorporating chasing pairs engagements results in the same functional form as B-D.

      We thank the reviewer for raising this question, and we apologize for not making this point clear enough at the time of our initial submission. We have now revised the related part of our manuscript (lines 56-62 in the main text) to make this clearer.

      In our definition, predator interference means the pairwise encounter between consumer individuals, while a chasing pair is formed by a pairwise encounter between a consumer individual and a resource individual. Thus, in these definitions, a scenario involving only chasing pairs does not involve pairwise encounters between consumer individuals (which is our definition of predator interference).

      We acknowledge that there can be different definitions of predator interference, and the reviewer's interpretation is based on a definition of predator interference that incorporates indirect interference without pairwise encounters between consumer individuals. We do not wish to argue about the appropriateness of definitions. However, since we have proven that scenarios involving only chasing pairs are under the constraint of CEP (see lines 139-144 in the main text and Appendix-fig. 3A-C; related references: Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), while the functional response of the B-D model can be derived from the scenario involving only chasing pairs without consideration of pairwise encounters between consumer individuals (Eq. S8 in Appendices; related references: Gert Huisman, Rob J De Boer, J. Theor. Biol. 185, 389 (1997) and Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), it is thus highly questionable regarding the validity of applying the B-D model to break CEP.

      More broadly, the specific functional form used to model predator interference is of secondary importance to the general insight that intraspecific interference (however it is modeled) can allow for coexistence. Mechanisms of predator interference are complex and vary substantially across species. Thus it is unlikely that any one specific functional form is generally applicable.

      We thank the reviewer for raising this issue. We agree that the general insight that intraspecific predator interference can facilitate species coexistence is of great importance. We also acknowledge that any functional form of a functional response is unlikely to be universally applicable, as explicit functional responses inevitably involve approximations. However, we must reemphasize the importance of verifying whether intraspecific predator interference can truly break CEP and explain the paradox of plankton, which is one of the primary objectives of our study. As mentioned above, since the B-D model can be derived from the scenario involving only chasing pairs (Eq. S8 in Appendices; related references: Gert Huisman, Rob J De Boer, J. Theor. Biol. 185, 389 (1997) and Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), while we have demonstrated that scenarios involving only chasing pairs are subject to the constraint of CEP (see lines 139-144 in the main text and Appendix-fig. 3A-C; related references: Xin Wang and Yang-Yu Liu, iScience 23, 101009 (2020)), it is highly questionable regarding the validity of applying the B-D model to break CEP.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I do not see any code or data sharing. They should exist in a prominent place. The authors should make their simulations and the analysis scripts freely available to download, e.g. by GitHub. This is always true but especially so in a journal like eLife.

      We appreciate the reviewer for these recommendations. We apologize for our oversight regarding the unsuccessful upload of the data in our initial submission, as the data size was considerable and we neglected to double-check for this issue. Following the reviewer’s recommendation, we have now uploaded the code and dataset to GitHub (accessible at https://github.com/SchordK/Intraspecific-predator-interference-promotesbiodiversity-in-ecosystems), where they are freely available for download.

      The introduction section should include more background, including about BD but also about consumer-resource models. Part of the results section could be moved/edited to the introduction. You should try that the results section should contain only "new" stuff whereas the "old" stuff should go in the introduction.

      We thank the reviewer for these recommendations. Following these suggestions, we have now reorganized our manuscript by adding a new paragraph to the introduction section (lines 51-62 in the main text) and revising related content in both the introduction and results sections (lines 63-67, 81-83 in the main text).

      I found myself getting a little bogged down in the general/formal description of the model before you go to specific cases. I found the most interesting part of the paper to be its second half. This is a dangerous strategy, a casual reader may miss out on the most interesting part of the paper. It's your paper and do what you think is best, but my opinion is that you could improve the presentation of the model and background to get to the specific contribution and specific use case quickly and easily, then immediately to the data. You can leave the more general formulation and the details to later in the paper or even the appendix. Ultimately, you have a simple idea and a beautiful application on interesting data-that is your strength I think, and so, I would focus on that.

      We appreciate the reviewer for the positive comments and valuable suggestions. Following these recommendations, we have revised the presentation of the background information to clarify the contribution of our manuscript, and we have refined our model presentation to enhance clarity. Meanwhile, as we need to address the concerns raised by other reviewers, we continue to maintain systematic investigations for scenarios involving different forms of pairwise encounters in the case of S<sub>C</sub> = 2 and S<sub>R</sub> = 1 before applying our model to the experimental data.

      Reviewer #2 (Recommendations For The Authors):

      (1) I believe the surfaces in Figs. 1F-H corresponds to the zero-growth isoclines. The authors should directly point it out in the figure captions and text descriptions.

      We thank the reviewer for this suggestion, and we have followed it to address the issue.

      (2) After showing equations 1 or 2, I believe it will help readers understand the mechanism of equations by adding text such as "(see Fig. 1B)" to the sentences following the equations.

      We appreciate the reviewer's suggestion, and we have implemented it to address the issue.

      (3) Lines 12, 129 143 & 188: "at steady state" -> "at a steady state"

      (4) Line 138: "is doom to extinct" -> "is doomed to extinct"

      (5) Line 170: "intraspecific interference promotes species coexistence along with stochasticity" -> "intraspecific interference still robustly promotes species coexistence when stochasticity is considered"

      (6) Line 190: "The long-term coexistence behavior are exemplified" -> "The long-term coexistence behavior is exemplified"

      (7) Line 227: "the coefficient of variation was taken round 0.3" -> "the coefficient of variation was taken around 0.3"?

      (8) Line 235: "tend to extinct" -> "tend to be extinct"

      We thank the reviewer for all these suggestions, and we have implemented each of them to revise our manuscript.

      Reviewer #3 (Recommendations For The Authors):

      I think this would be a much more useful paper if the authors focused on how the behavior of this model differs from existing models rather than showing that the new formation also generates the same dynamics as the existing theory.

      We thank the reviewers for this suggestion, and we apologize for not explaining the limitations of the B-D model and the related studies on the topic of CEP clearly enough at the time of our initial submission. As we have explained in the responses above, we have now revised the introduction part of our manuscript (lines 5167 in the main text) to make it clear that since the functional response in the B-D model can be derived from the scenario involving only chasing pairs without consideration of pairwise encounters between consumer individuals, while we have demonstrated that a scenario involving only chasing pairs is under the constraint of CEP, it is thus highly questionable regarding the validity of the studies relying on the B-D model to break CEP or explain the paradox of the plankton. Consequently, one of the major objectives of our manuscript is to resolve whether the mechanism of intraspecific interference can truly break CEP and explain the paradox of the plankton in a rigorous manner. By modeling from a mechanistic perspective, we resolve the above issues and quantitatively illustrate a broad spectrum of experimental results, including two classical experiments that violate CEP and the rank-abundance curves across diverse ecological communities.

      Things that would be of interest:

      What are the conditions for coexistence in this model? Presumably, it depends heavily on the equilibrium abundances of the consumers and resources as well as the engagement times/rates.

      We thank the reviewer for raising this question. We have shown that there is a wide range of parameter space for species coexistence in our model. Specifically, for the case involving two consumer species and one resource species (S<sub>C</sub> = 2 and S<sub>R</sub> \= 1), we have conducted a systematic study on the parameter region for promoting species coexistence. For clarity, we set the mortality rate 𝐷<sub>i</sub> (i = 1, 2) as the only parameter that varies with the consumer species, and the order of magnitude of all model parameters was estimated from behavioral data. The results for scenarios involving intraspecific predator interference are shown in Appendix-figs. 4B-D, 5A, 6C-D and we redraw some of them here as Fig. R2, including both ODEs and SSA results, wherein Δ = (𝐷<sub>1</sub>-𝐷<sub>2</sub>)/ 𝐷<sub>2</sub> represents the competitive difference between the two consumer species. For example, Δ =1 means that species C2 is twice the competitiveness of species C<sub>1</sub>. In Fig. R2 (see also Appendix-figs. 4B-D, 5A, 6C-D), we see that the two consumer species can coexist with a large competitive difference in either ODEs and SSA simulation studies.

      Author response image 2.

      The parameter region for two consumer species coexisting with one type of abiotic resource species (S<sub>C</sub> =2 and S<sub>R</sub> \=1). (A) The region below the blue surface and above the red surface represents stable coexistence of the three species at constant population densities. (B) The blue region represents stable coexistence at a steady state for the three species. (C) The color indicates (refer to the color bar) the coexisting fraction for long-term coexistence of the three species. Figure redrawn from Appendixfigs. 4B, 6C-D.

      For systems shown in Fig. 3A-D, where the number of consumer species is much larger than that of the resource species, we set each consumer species with unique competitiveness through a distinctive 𝐷<sub>i</sub> (i =1,…, S<sub>C</sub>). In Fig. 3A-D (see also Appendix fig. 10), we see that hundreds of consumer species may coexist with one or three types of resources when the coefficient of variation (CV) of the consumer species’ competitiveness was taken around 0.3, which indicates a large parameter region for promoting species coexistence.

      Is there existing data to estimate the parameters in the model directly from behavioral data? Do these parameter ranges support the hypothesis that predator interference is significant enough to allow for the coexistence of natural predator populations?

      We appreciate the reviewer for raising this question. Indeed, the parameters in our model were primarily determined by estimating their reasonable range from behavioral data. Following the reviewer's suggestions, we have now specified the data we used to set the parameters. For instance, in Fig. 2D, we set 𝐷<sub>2</sub>\=0.01 with τ=0.4 Day, resulting in an expected lifespan of Drosophila serrata in our model setting of 𝜏⁄𝐷<sub>2</sub>\= 40 days, which roughly agrees with experimental behavioral data showing that the average lifespan of D. serrata is 34 days for males and 54 days for females (lines 321325 in the appendices; reference: Narayan et al. J Evol Biol. 35: 657–663 (2022)). To account for competitive differences, we set the mortality rate as the only parameter that varies among the consumer species. As specified in the Appendices, the CV of the mortality rate is the only parameter that was used to fit the experiments within the range of 0.15-0.43. This parameter range (i.e., 0.15-0.43) was directly estimated from experimental data in the reference article (Patricia Menon et al., Water Research 37, 4151(2003)) using the two-sigma rule (lines 344-347 in the appendices).

      Given the high consistency between the model results and experiments shown in Figs. 2D-E and 3C-D, where all the key model parameters were estimated from experimental data in references, and considering that the rank-abundance curves shown in Fig. 3C-D include a wide range of ecological communities, there is no doubt that predator interference is significant enough to allow for the coexistence of natural predator populations within the parameter ranges estimated from experimental references.

      Bifurcation analyses for the novel parameters of this model. Does the fact that prey can escape lead to qualitatively different model behaviors?

      Author response image 3.

      Bifurcation analyses for the separate rate d’<sub>i</sub> and escape rate d<sub>i</sub> (i =1, 2) of our model in the case of two consumer species competing for one abiotic resource species (S<sub>C</sub> =2 and S<sub>R</sub> \=1). (A) A 3D representation: the region above the blue surface signifies competitive exclusion where C<sub>1</sub> species extinct, while the region below the blue surface and above the red surface represents stable coexistence of the three species at constant population densities. (B) a 2D representation: the blue region represents stable coexistence at a steady state for the three species. Figure redrawn from Appendix-fig. 4C-D.

      We appreciate the reviewer for this suggestion. Following this suggestion, we have conducted bifurcation analyses for the separate rate d’<sub>i</sub> and escape rate d<sub>i</sub> of our model in the case where two consumer species compete for one resource species (S<sub>C</sub> =2 and S<sub>R</sub> \=1). Both 2D and 3D representations of these results have been included in Appendix-fig. 4, and we redraw them here as Fig. R3. In Fig. R3, we set the mortality rate 𝐷<sub>i</sub> (i =1, 2) as the only parameter that varies between the consumer species, and thus Δ = _(D1-𝐷<sub>2</sub>)/𝐷<sub>2</sub> represents the competitive difference between the two species.

      As shown in Fig. R3A-B, the smaller the escape rate d<sub>i</sub>, the larger the competitive difference Δ tolerated for species coexistence at steady state. A similar trend is observed for the separate rate d’<sub>i</sub>. However, there is an abrupt change for both 2D and 3D representations at the area where d’<sub>i</sub> =0, since if d’<sub>i</sub> =0, all consumer individuals would be trapped in interference pairs, and then no consumer species could exist. On the contrary, there is no abrupt change for both 2D and 3D representations at the area where d<sub>i</sub>\=0, since even if d<sub>i</sub>\=0, the consumer individuals could still leave the chasing pair through the capture process.

      Figures: I found the 3D plots especially Appendix Figure 2 very difficult to interpret. I think 2D plots with multiple lines to represent predator densities would be more clear.

      We thank the reviewer for this suggestion. Following this suggestion, we have added a 2D diagram to Appendix-fig. 2.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odorevoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, directly impacts PN excitability, and uniformly enhances PN responses to odors.

      Weaknesses:

      The one remaining issue to be resolved is the theoretical discrepancy between the physiology and the behavior. The authors provide a computational model that could explain this discrepancy and provide the caveat that while the physiological data was collected from the antennal lobe, but there could be other olfactory processing stages involved. Indeed other processing stages could be the sites for the computational functions proposed by the model. There is an additional caveat which is that the physiological data were collected 5-10 minutes after serotonin application whereas the behavioral data were collected 3 hours after serotonin application. It is difficult to link physiological processes induced 5 minutes into serotonin application to behavioral consequences 3 hours subsequent to serotonin application. The discrepancy between physiology and behavior could easily reflect the timing of action of serotonin (i.e. differences between immediate and longer-term impact).

      For our behavioral experiments, we waited 3 hours after serotonin injection to allow serotonin to penetrate through the layers of air sacks and the sheath, and for the locusts to calm down and recover their baseline POR activity levels. For the physiology experiments, we noticed that the quality of the patch decreased over time after serotonin introduction. Hence, it was difficult to hold cells for that long. However, the point raised by the reviewer is well-taken. We have performed additional experiments to show that the changes in POR levels to different odorants are rapid and can be observed within 15 minutes of injecting serotonin (Author response image 2) and that the physiological changes in PNs (bursting spontaneous activity, maintenance of temporal firing patterns, and increase odor-evoked responses) persists when the cells are held for longer duration (i.e. 3 hours akin to our behavioral experiments). It is worth noting that 3-hour in-vivo intracellular recordings are not easily achievable and come with many experimental constraints. So far, we have managed to record from two PNs that were held for this long and add them to this rebuttal to support our conclusions. (Author response image 1).

      Author response image 1.

      Spontaneous and odor-evoked responses in individual PNs remain consistent for three hours after serotonin introduction into the recording chamber/bath.<br /> (A) Representative intracellular recording showing membrane potential fluctuations in a projection neuron (PN) in the antennal lobe. Spontaneous and odor-evoked responses to four odorants (pink color bars, 4 s duration) are shown before (control) and after serotonin application (5HT). Voltage traces 30 minutes (30min), 1 hour (1h), 2 hours (2h), and 3 hours (3h) after 5HT application are shown to illustrate the persisting effect of serotonin during spontaneous and odor-evoked activity periods.<br /> (B) Rasterized spiking activities in two recorded PNs are shown. Spontaneous and odor-evoked responses are shown in all 5 consecutive trials. Note that the odor-evoked response patterns are maintained, but the spontaneous activity patterns are altered after serotonin introduction.

      Author response image 2.

      Palp-opening response (POR) patterns to different odorants remain consistent following serotonin introduction. The probability of PORs is shown as a bar plot for four different odorants; hexanol (green), benzaldehyde (blue), linalool (red), and ammonium (purple). PORs before serotonin injection (solid bars) are compared against response levels after serotonin injection (striped bars). As can be noted, PORs to the four odorants remain consistent when tested 15 minutes and 3 hours after (5HT) serotonin injection.

      Overall, the study demonstrates the impact of serotonin on odor-evoked responses of PNs and odor-guided behavior in locusts. Serotonin appears to have non-linear effects including changing the firing patterns of PNs from monotonic to bursting and altering behavioral responses in an odor-specific manner, rather than uniformly across all stimuli presented.

      We thank the reviewer for again providing very useful feedback for improving our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odor-specific way. In physiology experiments, they can show that projection neurons in the antennal lobe generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odor-specific changes in behavior.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of projection neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla.

      Weaknesses:

      I still have several concerns regarding the generalizability of the model and interpretation of results. The authors cannot provide evidence that serotonin modulation of projection neurons impacts behavior.

      This is true and likely to be true for any study linking neural responses to behavior. There are multiple circuits and pathways that would get impacted by a neuromodulator like serotonin. What we showed with our physiology is how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Given the specificity of the changes in behavioral outcomes (i.e. odor-specific increase and decrease in an appetitive behavior) and non-specificity in the changes at the level of individual PNs (general increase in odor-evoked spiking activity), we presented a relatively simple computational model to address the apparent mismatch between neural and behavioral responses. (Author response image 4).

      The authors show that odor identity is maintained after 5-HT injection, however, the authors do not show if PN responses to different odors were differently affected after serotonin exposure.

      The PN responses to different odorants changed in a qualitatively similar fashion. (Author response image 3)

      Author response image 3.

      PN activity before and after 5HT application are compared for different cellodor combinations. As can be noted, the changes are qualitatively similar in all cases. After 5HT application, the baseline activity became more bursty, but the odor-evoked response patterns were robustly maintained for all odorants.

      Regarding the model, the authors show that the model works for odors with non-overlapping PN activation. However, only one appetitive, one neutral, and one aversive odor has been tested and modeled here. Can the fixed-weight model also hold for other appetitive and aversive odors that might share more overlap between active PNs? How could the model generate BZA attraction in 5-HT exposed animals (as seen in behavior data in Figure 1) if the same PNs just get activated more?

      Author response image 4.

      Testing the generality of the proposed computational model. To test the generality of the model proposed we used a published dataset [Chandak and Raman, 2023]: Neural dataset – 89 PN responses to a panel of twenty-two odorants; Behavioral dataset – probability of POR responses to the same twenty-two odorants. We built the model using just the three odorants overlapping between the two datasets: hexanol, benzaldehyde and linalool. The true probability of POR values of the twenty odorants and the POR probability predicted by the model are shown for all twenty-two odorants as a scatter plot. As can be noted, there is a high correlation (0.79) between the true and the predicted values.

      The authors should still not exclude the possibility that serotonin injections could affect behavior via modulation of other cell types than projection neurons. This should still be discussed, serotonin might rather shut down baseline activation of local inhibitory neurons - and thus lead to the interesting bursting phenotypes, which can also be seen in the baseline response, due to local PN-to-LN feedback.

      As we agreed, there could be other cells that are impacted by serotonin release. Our goal in this study was to characterize how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Within this circuit, there are local inhibitory neurons (LNs), as correctly indicated by this reviewer. Surprisingly, our preliminary data indicates that LNs are not shut down but also have an enhanced odor-evoked neural response. (Author response image 5.) Further data would be needed to verify this observation and determine the mechanism that mediate the changes in PN excitability. Irrespective, since PN activity should incorporate the effects of changes in the local neuron responses and is the sole output from the antennal lobe that drives all downstream odor-evoked activity, we focused on them in this study.

      Author response image 5.

      Representative traces showing intracellular recording from a local neuron in the antennal lobe. Five consecutive trials are shown. Note that LNs in the locust antennal lobe are non-spiking. The LN activity before, during, and after the presentation of benzaldehyde and hexanol (colored bar; 4s) are shown. The Left and Right panels show LN activity before and after the application of 5HT. As can be noted, 5HT did not shut down odor-evoked activity in this local neuron.

      The authors did not fully tone down their claims regarding causality between serotonin and starved state behavioral responses. There is no proof that serotonin injection mimics starved behavioral responses.

      Specific minor issues:<br /> It is still unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium). The new method part does not indicate the concentrations of odors used for electrophysiology.

      All odorants were diluted to 0.01-10% concentration by volume in either mineral oil or distilled water. This information is included in the Methods section. For most odorants used in the study, the lower concentrations only evoked a very weak neural response, and the higher concentrations evoked more robust responses. The POR responses for these odorants at various concentrations chosen are included in Figure 2. Note, that the responses to linalool and ammonium remained weak throughout the concentration changes, compared to hexanol and benzaldehyde.

      Did all tested PNs respond to all odorants?

      No, only a subset of them responses to each odorant. These responses have been well characterized in earlier publications [included refs].

      The authors do not show if PN responses to different odors were differently affected after serotonin exposure. They describe that ON responses were robust, but OFF responses were less consistent after 5-HT injection. Was this true across all odors tested? Example traces are shown, but the odor is not indicated in Figure 4A. Figure 4D shows that many odor-PN combinations did not change their peak spiking activity - was this true across odorants? In Figure 5 - are PNs ordered by odor-type exposure?

      Also, Figure 6A only shows example trajectories for odorants - how does the average look? Regarding the data used for the model - can the new dataset from the 82 odor-PN pairs reproduce the activation pattern of the previously collected dataset of 89 pairs?

      What is shown in Figure 6A is the trial-averaged response trajectory combining activities of all 82 odor-PN pairs. 82 odor-PN pair was collected intracellularly examining the responses to four odorants before and after 5HT application. The second dataset involving 89 PN responses to 22 odorants was collected extracellularly. They have qualitative similarities in each odorant activate a unique subset of those neurons.

      The authors toned down their claims that serotonin injection can mimic the starved state behavioral response. However, some sentences still indicate this finding and should also be toned down:

      last sentence of introduction - "In sum, our results provide a more systems-level view of how a specific neuromodulator (serotonin) alters neural circuits to produce flexible behavioral outcomes."

      We believe we showed this with our computational model, how uniform changes in the neural responses could lead to variable and odor-specific changes in behavioral PORs.

      discussion: "Finally, fed locusts injected with serotonin generated similar appetitive responses to food-related odorants as starved locusts indicating the role of serotonin in hunger statedependent modulation of odor-evoked responses." This claim is not supported.

      Figure 7 shows that the fed locusts had lower POR to hex and bza. The POR responses significantly increased after the 5HT application. However, we have rephrased this sentence to limit our claims to this result. "Finally, fed locusts injected with serotonin generated similar appetitive palp-opening responses to food-related odorants as observed in starved locusts”

      last results: "However, consistent with results from the hungry locusts, the introduction of serotonin increased the appetitive POR responses to HEX and BZA. Intriguingly, the appetitive responses of fed locusts treated with 5HT were comparable or slightly higher than the responses of hungry locusts to the same set of odorants."

      Again this sentence simply describes the result shown in Figure 7.

      In Figure 7 - BZA response seems unchanged in hungry and fed animals and only 5-HT injection enhances the response. There is only one example where 5-HT application and starvation induce the same change in behavior - N=1 is not enough to conclude that serotonin influences food-driven behaviors.

      The reviewer is ignoring the lack of changes to PORs to linalool and ammonium. Taken together, serotonin increased PORs to only two of the four odorants in starved locusts. The responses after 5HT modulation to these four odorants were similar in fed locusts treated with 5HT and starved locusts.

      Also, this seems to be wrongly interpreted in Figure 7: "It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, remained unchanged in fed locusts treated with 5HT." The authors indicate a significant reduction in POR after 5-HT injection on LOOL response in Figure 7.

      Revised.<br /> It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, and reduced in fed locusts treated with 5HT."

      Also, the newly added sentence at the end of the discussion does not make sense: "However, since 5HT increased behavioral responses in both fed and hungry locusts, the precise role of 5HT modulation and whether it underlies hunger-state dependent modulation of appetitive behavior still remains to be determined."<br /> The authors did not test 5-HT injection in starved animals

      The results shown in Figure 1 compare the POR responses of starved locusts before and after 5HT introduction.

      We again thank the reviewer for useful feedback to further improve our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odor-evoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, and uniformly enhances PN responses to odors. Overall, I had no technical concerns. Weaknesses:

      While there are several interesting observations, the conclusions that serotonin enhanced sensitivity specifically and that serotonin had feeding-state-specific effects, were not supported by the evidence provided. Furthermore, there were other instances in which much more clarification was needed for me to follow the assumptions being made and inadequate statistical testing was reported.

      Major concerns.

      • To enhance olfactory sensitivity, the expected results would be that serotonin causes locusts to perceive each odor as being at a relatively higher concentration. The authors recapitulate a classic olfactory behavioral phenomenon where higher odor concentrations evoke weaker responses which is indicative of the odors becoming aversive. If serotonin enhanced the sensitivity to odors, then the dose-response curve should have shifted to the left, resulting in a more pronounced aversion to high odor concentrations. However, the authors show an increase in response magnitude across all odor concentrations. I don't think the authors can claim that serotonin enhances the behavioral sensitivity to odors because the locusts no longer show concentration-dependent aversion. Instead, I think the authors can claim that serotonin induces increased olfactory arousal.

      The reviewer makes a valid point. Bath application of serotonin increased POR behavioral responses across all odor concentrations, and concentration-dependent aversion was also not observed. Furthermore, the monotonic relationship between projection neuron responses and the intensity of current injection is altered when serotonin is exogenously introduced (see Author response image 1; see below for more explanation). Hence, our data suggests that serotonin alters the dose-response relationship between neural/behavioral responses and odor intensity. As recommended, we have followed what the reviewer has suggested and revised our claim to serotonin inducing increase in olfactory arousal. The new physiology data has been added as Supplementary Figure 3 to the revised manuscript.

      • The authors report that 5-HT causes PNs to change from tonic to bursting and conclude that this stems from a change in excitability. However, excitability tests (such as I/V plots) were not included, so it's difficult to disambiguate excitability changes from changes in synaptic input from other network components.

      To confirm that the PN excitability did indeed change after serotonin application, we performed a new set of current-clamp recordings. In these experiments, we monitored the spiking activities in individual PNs as we injected different levels of current injections (200 – 1000 pico Amperes). Note that locust LNs that provide recurrent inhibition arborize and integrate inputs from a large number of sensory neurons and projection neurons. Therefore, activating a single PN should not activate the local neurons and therefore the antennal lobe network.

      We found that the total spiking activity monotonically increased with the magnitude of the current injection in all four PNs recorded (Author response image 1). However, after serotonin injection, we found that the spiking activity remained relatively stable and did not systematically vary with the magnitude of the current injection. While the changes in odor-evoked responses may incorporate both excitability changes in individual PNs and recurrent feedback inhibition through GABAergic LNs, these results from our current injection experiments unambiguously indicate that there are changes in excitability at the level of individual PNs. We have added this result to the revised manuscript.

      Author response image 1.

      Current-injection induced spiking activity in individual PNs is altered after serotonin application. (A) Representative intracellular recordings showing membrane potential fluctuations as a function of time for one projection neuron (PNs) in the locust antennal lobe. A two-second window when a positive 200-1000pA current was applied is shown. Firing patterns before (left) and after (right) serotonin application are shown for comparison. Note, the spiking activity changes after the 5HT application. The black bar represents the 20mV scale. (B) Dose-response curves showing the average number of action potentials (across 5 trials) during the 2second current pulse before (green) and after (purple) serotonin for each recorded PN. Note that the current intensity was systematically increased from 200 pA to 1000 pA. The (C) The mean number of spikes across the four recorded cells during current injection is shown. The color progression represents the intensity of applied current ranging 200pA (leftmost bar) to 1000pA (rightmost bar). The dose-response trends before (green) and after (purple) 5HT application are shown for comparison.. The error bars represent SEM across the four cells.

      • There is another explanation for the theoretical discrepancy between physiology and behavior, which is that odor coding is further processing in higher brain regions (ie. Other than the antennal lobe) not studied in the physiological component of this study. This should at least be discussed.

      This is a valid argument. For our model of neural mapping onto behavior to work, we only need the odorant that evokes or suppresses PORs to activate a distinct set of neurons. Having said that, our extracellular recording results (Fig. 6E) indicate that hexanol (high POR) and linalool (low POR) do activate highly non-overlapping sets of PNs in the antennal lobe. Hence, our results suggest that the segregation of neural activity based on behavioral relevance already begins in the antennal lobe. We have added this clarification to the discussion section.

      • The authors cannot claim that serotonin underlies a hunger state-dependent modulation, only that serotonin impacts responses to appetitive odors. Serotonin enhanced PORs for starved and fed locusts, so the conclusion would be that serotonin enhances responses regardless of the hunger state. If the authors had antagonized 5-HT receptors and shown that feeding no longer impacts POR, then they could make the claim that serotonin underlies this effect. As it stands, these appear to be two independent phenomena.

      This is also a valid point. We have clarified this in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odorspecific way. In physiology experiments, they can show that antennal lobe neurons generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odorspecific changes in behavior. The authors finally suggest that serotonin injection can mimic a change in a hunger state.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of antennal lobe neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla. Weaknesses:

      I have several concerns regarding missing control experiments, unclear data analysis, and interpretation of results.

      A detailed description of the behavioral experiments is lacking. Did the authors also provide a mineral oil control and did they analyze the baseline POR response? Is there an increase in baseline response after serotonin exposure already at the behavioral output level? It is generally unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium).

      POR protocol: Sixth instar locusts (Schistocera americana) of either sex were starved for 24-48 hours before the experiment or taken straight from the colony and fed blades of grass for the satiated condition. Locusts were immobilized by placing them in the plastic tube and securing their body with black electric tape (see Author response image 2). Locusts were given 20 - 30 minutes to acclimatize after placement in the immobilization tube. As can be noted, the head of the locusts along with the antenna and maxillary palps protruded out of this immobilization tube so they can be freely moved by the locusts. Note that the maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process.

      It is worth noting that our earlier studies had shown that the presentation of ‘appetitive odorants’ triggers the locust to open their maxillary palps even when no food is presented (Saha et al., 2017; Nizampatnam et al., 2018; Nizampatnam et al., 2022; Chandak and Raman, 2023.) Furthermore, our earlies results indicate that the probability of palp opening varies across different odorants (Chandak and Raman, 2023). We chose four odorants that had a diverse range of palp-opening: supra-median (hexanol), median (benzaldehyde), and sub-median (linaool). Therefore, each locust in our experiments was presented with one concentration of four odorants (hexanol, benzaldehyde, linalool, and ammonium) in a pseudorandomized order. The odorants were chosen based on our physiology results such that they evoked different levels of spiking activities.

      The odor pulse was 4 s in duration and the inter-pulse interval was set to 60 s. The experiments were recorded using a web camera (Microsoft) placed right in front of the locusts. The camera was fully automated with the custom MATLAB script to start recording 2 seconds before the odor pulse and end recording at odor termination. An LED was used to track the stimulus onset/offset. The POR responses were manually scored offline. Responses to each odorant were scored a 0 or 1 depending on if the palps remained closed or opened. A positive POR was defined as a movement of the maxillary palps during the odor presentation time window as shown on the locust schematic (Main Paper Figure 1).

      Author response image 2.

      Pictures showing the behavior experiment setup and representative palp-opening responses in a locust.

      As the reviewer inquired, we performed a new series of POR experiments, where we explored POR responses to mineral oil and hexanol, before and after serotonin injection. For this study, we used 10 locusts that were starved 24-48 hours before the experiment. Note that hexanol was diluted at 1% (v/v) concentration in mineral oil. Our results reveal that locusts PORs to hexanol (~ 50% PORs) were significantly higher than those triggered by mineral oil (~10% PORs). Injection of serotonin increased the POR response rate to hexanol but did not alter the PORs evoked by mineral oil (Author response image 3).

      Author response image 3.

      Serotonin does not alter the palp-opening responses evoked by paraffin oil. The PORs before and after (5HT) serotonin injection are summarized and shown as a bar plot for hexanol and paraffin oil. Striped bars signify the data collected after 5HT injection. Significant differences are identified in the plot (one-tailed paired-sample t-test; (*p<0.05).

      Regarding recordings of potential PNs - the authors do not provide evidence that they did record from projection neurons and not other types of antennal lobe neurons. Thus, these claims should be phrased more carefully.

      In the locust antennal lobe, only the cholinergic projection neurons fire full-blown sodium spikes. The GABAergic local neurons only fire calcium ‘spikelets’ (Laurent, TINS, 1996; Stopfer et al., 2003; see Author response image 4 for an example). Hence, we are pretty confident that we are only recording from PNs. Furthermore, due to the physiological properties of the LNs, their signals being too small, they are also not detected in the extracellular recordings from the locust antennal lobe. Hence, we are confident with our claims and conclusion.

      Author response image 4.

      PN vs LN physiological differences: Left: A representative raw voltage traces recorded from a local neuron before, during, and after a 4-second odor pulse are shown. Note that the local neurons in the locust antennal lobe do not fire full-blown sodium spikes but only fire small calcium spikelets. On the right: A representative raw voltage trace recorded from a representative projection neuron is shown for comparison. Clear sodium spikes are clearly visible during spontaneous and odor-evoked periods. The gray bar represents 4 seconds of odor pulse. The vertical black bar represents the 40mV.

      The presented model suggests labeled lines in the antennal lobe output of locusts. Could the presented model also explain a shift in behavior from aversion to attraction - such as seen in locusts when they switch from a solitarious to a gregarious state? The authors might want to discuss other possible scenarios, such as that odor evaluation and decision-making take place in higher brain regions, or that other neuromodulators might affect behavioral output. Serotonin injections could affect behavior via modulation of other cell types than antennal lobe neurons. This should also be discussed - the same is true for potential PNs - serotonin might not directly affect this cell type, but might rather shut down local inhibitory neurons.

      There are multiple questions here. First, regarding solitary vs. gregarious states, we are currently repeating these experiments on solitary locusts. Our preliminary results (not included in the manuscript) indicate that the solitary animals have increased olfactory arousal and respond with a higher POR but are less selective and respond similarly to multiple odorants. We are examining the physiology to determine whether the model for mapping neural responses onto behavior could also explain observations in solitary animals.

      Second, this reviewer makes the point raised by Reviewer 1. We agree that odor evaluation and decisionmaking might take place in higher brain regions. All we could conclude based on our data is that a segregation of neural activity based on behavioral relevance might provide the simplest approach to map non-specific increase in stimulus-evoked neural responses onto odor-specific changes in behavioral outcome. Furthermore, our results indicate that hexanol and linalool, two odorants that had an increase and decrease in PORs after serotonin injection, had only minimal neural response overlap in the antennal lobe. These results suggest that the formatting of neural activity to support varying behavioral outcomes might already begin in the antennal lobe. We have added this to our discussion.

      Third, regarding serotonin impacting PNs, we performed a new set of current-clamp experiments to examine this issue (Author response image 1). Our results clearly show that projection neuron activity in response to current injections (that should not incorporate feedback inhibition through local neurons) was altered after serotonin injection. Therefore, the observed changes in the odor-evoked neural ensemble activity should incorporate modulation at both individual PN level and at the network level. We have added this to our discussion as well.

      Finally, the authors claim that serotonin injection can mimic the starved state behavioral response. However, this is only shown for one of the four odors that are tested for behavior (HEX), thus the data does not support this claim.

      We note that Hex is the only appetitive odorant in the panel. But, as reviewer 1 has also brought up a similar point, we have toned down our claims and will investigate this carefully in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Was the POR of the locusts towards linalool and ammonium higher than towards a blank odor cartridge? I ask because the locusts appear to be less likely to respond to these odors and so I am concerned that this assay is not relevant to the ecological context of these odors. In other words, perhaps serotonin did not enhance the responses to these odors in this assay, because this is not a context in which locusts would normally respond to these odors.

      The POR response to linalool and ammonium is lower and comparable to that of paraffin oil. Serotonin does not increase POR responses to paraffin oil but does increase response to hexanol (an appetitive odorant). We have clarified this using new data (Author response image 5).

      • It seems to me that Figure 5C is the crux for understanding the potential impact of 5-HT on odor coding, but it is somewhat confusing and underutilized. Is the implication that 5-HT decorrelates spontaneous activity such that when an odor stimulus arrives, the odor-evoked activity deviates to a greater degree? The authors make claims about this figure that require the reader to guess as to the aspect of the figure to which they are referring.

      The reviewer makes an astute observation. Yes, the spontaneous activity in the antennal lobe network before serotonin introduction is not correlated with the ensemble spontaneous activity after serotonin bath application. Remarkably, the odor-evoked responses were highly similar, both in the reduced PCA space and when assayed using high-dimensional ensemble neural activity vectors. Whether the changes in network spontaneous activity have a function in odor detection and recognition is not fully understood and cannot be convincingly answered using our data. But this is something that we had pondered.

      • The modeling component summarized in Figure 6 needs clarification and more detail. Perhaps example traces associated with positive weighting within neural ensemble 1 relative to neural ensemble 2? I struggled to understand conceptually how the model resolved the theoretical discrepancy between physiology and behavior.

      As recommended, here is a plot showing the responses of four PNs that had positive weights to hexanol and linalool. As can be expected, each PN in this group had higher responses to hexanol and no response to linalool. Further, the four PNs that received negative weights had response only to linalool.

      Author response image 5.

      Odor-evoked responses of four PNs that received positive weights in the model (top panel), and four PNs that were assigned negative weights in the model (bottom).

      • Was there a significant difference between the PORs of hungry vs. fed locusts? The authors state that they differ and provide statistics for the comparisons to locusts injected with 5-HT, but then don't provide any statistical analyses of hungry vs. fed animals.

      The POR responses to HEX (an appetitive odorant) were significantly different between the hungry and starved locusts.

      Author response image 6.

      A bar plot summarizing PORs to all four odors for satiated locust (highlighted with stripes), before (dark shade), and after 5HT injection (lighter shade). To allow comparison before 5HT injection for starved locust plotted as well (without stripes). The significance was determined using a one-tailed paired-sample ttest(*p<0.05).

      • Were any of the effects of 5-HT on odor-evoked PN responses significant? No statistics are provided.

      We examined the distribution of odor-evoked responses in PNs before and after 5HT introduction. We found that the overall distribution was not significantly different between the two (one-tailed pairedsample t-test; p = 0.93).

      Author response image 7.

      Comparison of the distribution of odor-evoked PN responses before (green) and after (purple) 5HT introduction. One-tailed paired sample t-test was used to compare the two distributions.

      • The authors interchangeably use "serotonin", "5HT" and "5-HT" throughout the manuscript, but this should be consistent.

      This has been fixed in the revised manuscript.

      • On page 2 the authors provide an ecological relevance for linalool as being an additive in pesticides, however, linalool is a common floral volatile chemical. Is the implication that locusts have learned to associate linalool with pesticides?

      Linalool is a terpenoid alcohol that has a floral odor but has also been used as a pesticide and insect repellent [Beier et al., 2014]. As shown in Author response image 2, it evoked the least POR responses amongst a diverse panel of 22 odorants that were tested. We have clarified how we chose odorants based on the prior dataset in the Methods section.

      • In Figure 1, there should be a legend in the figure itself indicating that the black box indicates the absence of POR and the white box indicates presence, rather than just having it in the legend text.

      Done.

      • In Figure 2, the raw data from each animal can be moved to the supplements. The way it is presented is overwhelming and the order of comparisons is difficult to follow.

      Done.

      • For the induction of bursting in PNs by the application of 5-HT, were there any other metrics observed such as period, duration of bursts, or peak burst frequency? The authors rely on ISI, but there are other bursting metrics that could also be included to understand the nature of this observation. In particular, whether the bursts are likely due to changes in intrinsic biophysical properties of the PNs or polysynaptic effects.

      We could use other metrics as the reviewer suggests. Our main point is that the spontaneous activity of individual PNs changed. We have added a new current-injection experiments to show that the PNs output to square pulses of current becomes different after serotonin application (Author response image 1)

      • Were 4-vinyl anisole, 1-nonanol, and octanoic acid selected as additional odors because they had particular ecological relevance, or was it for the diversity of chemical structure?

      These odorants were selected based on both, chemical structure and ecological relevance. The logic behind this was to have a very diverse odor panel that consisted of food odorant – Hexanol, aggregation pheromone – 4-vinyl anisole, sex pheromone – benzaldehyde, acid – octanoic acid, base – ammonium, and alcohol – 1-nonanol. Additionally, we selected these odors based on previous neural and behavioral data on these odorants (Chandak and Raman, 2023, Traner and Raman, 2023, Nizampatnam et al, 2022 & 2018; Saha et al., 2017 & 2013).

      Reviewer #2 (Recommendations For The Authors):

      The electrophysiology dataset combines all performed experiments across all tested different PN-odor pairs. How many odors have been tested in a single PN and how many PNs have been tested for a single odor? This information is not present in the current manuscript. Can the authors exclude that there are odor-specific modulations?

      In total, our dataset includes recordings from 19 PNs. Seven PNs were tested on a panel of seven odorants (4-vinyl anisole, 1-nonanol, octanoic acid, Hex, Bza, Lool, and Amn), and the remaining twelve were tested with the four main odorants used in the study (Hex, Bza, Lool, and Amn). This information has been added to the Methods section

      How did the authors choose the concentrations of serotonin injections and bath applications - is this a naturalistic amount?

      The serotonin concentration for ephys experiments was chosen based on trial-error experiments:

      0.01mM was the highest concentration that did not cause cell death. For the behavioral experiments, we increased the concentration (0.1 M) due to the presence of anatomical structures in the locust's head such as air sacks, sheath as well as hemolymph which causes some degree of dilution that we cannot control.

      Behavior experiments were performed 3 hours after injection - ephys experiments 5-10 minutes following bath application. Can the authors exclude that serotonin affects neural processing differently on these different timescales?

      We cannot exclude this possibility. We did ePhys experiments 5-10 minutes after bath application as it would be extremely hard to hold cells for that long.

      A longer delay was required for our behavioral experiments as the locusts tended to be a bit more agitated with larger spontaneous movements of palps as well as exhibited unprompted vomiting. A 3hour period allowed the locust to regain its baseline level movements after 5HT introduction. [This information has been added to the methods section of the revised manuscript]

      Concerning the analysis of electrophysiological data. The authors should correct for changes in the baseline before performing PCA analysis. And how much of the variance is explained by PC1 and PC2?

      We did not correct for baseline changes or subtract baseline as we wanted to show that the odor-evoked neural responses still robustly encoded information about the identity of the odorant.

      The authors should perform dye injections after recordings to visualize the cell type they recorded from. Serotonin might affect also other cell types in the antennal lobe.

      As mentioned above, in the locust antennal lobe only PNs fire full-blown sodium spikes, and LNs only fire calcium spikelets (Author response image 4). Since these signals are small, they will be buried under the noise floor when using extracellular recording electrodes for monitoring responses in the AL antennal lobe.

      Hence we are pretty certain what type of cells we are recording from.

      There were several typos in the manuscript, please check again.

      We have fixed many of the grammatical errors and typos in the revised version.

    1. Author response:

      “Overall, the paper has several strengths, including leveraging large-scale, multi-modal datasets, using computational reasonable tools, and having an in-depth discussion of the significant results.”

      We thank the reviewer for the very supportive comments.

      Based on the comments and questions, we have grouped the concerns and corresponding responses into three categories.

      (1) The scope and data selection

      “The results are somewhat inconclusive or not validated.

      The overall results are carefully designed, but most of the results are descriptive. While the authors are able to find additional evidence either from the literature or explain the results with their existing knowledge, none of the results have been biologically validated. Especially, the last three result sections (signaling pathways, eQTLs, and TF binding) further extended their findings, but the authors did not put the major results into any of the figures in the main text.”

      The goal of this manuscript is to provide a list of putative childhood obesity target genes to yield new insights and help drive further experimentation. Moreover, the outputs from signaling pathways, eQTLs, and TF binding, although noteworthy and supportive of our method, were not particularly novel. In our manuscript we placed our focus on the novel findings from the analyses. We did, however, report the part of the eQTLs analysis concerning ADCY3, which brought new insight to the pathology of obesity, in Figure 4C.

      “The manuscript would benefit from an explanation regarding the rationale behind the selection of the 57 human cell types analyzed. it is essential to clarify whether these cell types have unique functions or relevance to childhood development and obesity.”

      We elected to comprehensively investigate the GWAS-informed cellular underpinnings of childhood development and obesity. By including a diverse range of cell types from different tissues and organs, we sought to capture the multifaceted nature of cellular contributions to obesity-related mechanisms, and open new avenues for targeted therapeutic interventions.

      There are clearly cell types that are already established as being key to the pathogenesis of obesity when dysregulated: adipocytes for energy storage, immune cell types regulating inflammation and metabolic homeostasis, hepatocytes regulating lipid metabolism, pancreatic cell types intricately involved in glucose and lipid metabolism, skeletal muscle for glucose uptake and metabolism, and brain cell types in the regulation of appetite, energy expenditure, and metabolic homeostasis.

      While it is practical to focus on cell types already proven to be associated with or relevant to obesity, this approach has its limitations. It confines our understanding to established knowledge and rules out the potential for discovering novel insights from new cellular mechanisms or pathways that could play significant roles in the pathogenesis if obesity. Therefore, it is was essential to reflect known biology against the unexplored cell types to expand our overall understanding and potentially identify innovative targets for treatment or prevention.

      “I wonder whether the used epigenome datasets are all from children. Although the authors use literature to support that body weight and obesity remain stable from infancy to adulthood, it remains uncertain whether epigenomic data from other life stages might overlook significant genetic variants that uniquely contribute to childhood obesity.”

      The datasets utilized in our study were derived from a combination of sources, both pediatric and adult. We recognize that epigenetic profiles can vary across different life stages but our principal effort was to characterize susceptibility BEFORE disease onset.

      “Given that the GTEx tissue samples are derived from adult donors, there appears to be a mismatch with the study's focus on childhood obesity. If possible, identifying alternative validation strategies or datasets more closely related to the pediatric population could strengthen the study's findings.” 

      We thank the reviewer for raising this important point. We acknowledge that the GTEx tissue samples are derived from adult donors, which might not perfectly align with the study's focus on childhood obesity. The ideal strategy would be a longitudinal design that follows individuals from childhood into adulthood to bridge the gap between pediatric and adult data, offering systematic insights into how early-life epigenetic markers influencing obesity later in life. In future work, we aim to carry out such efforts, which will represent substantial time and financial commitment.

      Along the same lines, the Developmental Genotype-Tissue Expression (dGTEx) Project is a new effort to study development-specific genetic effects on gene expression at 4 developmental windows spanning from infant to post-puberty (0-18 years). Donor recruitment began in August 2023 and remains ongoing. Tissue characterization and data production are underway. We hope that with the establishment of this resource, our future research in the field of pediatric health will be further enhanced.

      “Figure 1B: in subplots c and d, the results are either from Hi-C or capture-C. Although the authors use different colors to denote them, I cannot help wondering how much difference between Hi-C and capture-C brings in. Did the authors explore the difference between the Hi-C and capture-C?”.

      Thank you for your comment. It is not within the scope of our paper to explore the differences between the Hi-C and Capture-C methods. In the context of our study, both methods serve the same purpose of detecting chromatin loops that bring putative enhancers to sometimes genomically distant gene promoters. Consequently, our focus was on utilizing these methods to identify relevant chromatin interactions rather than comparing their technical differences.

      (2) Details on defining different categories of the regions of interest

      “Some technical details are missing.

      While the authors described all of their analysis steps, a lot of the time, they did not mention the motivation. Sometimes, the details were also omitted.”

      We will add a section to the revision to address the rationale behind different OCRs categories.

      “Line 129: should "-1,500/+500bp" be "-500/+500bp"? 

      A gene promoter was defined as a region 1,500 bases upstream to 500 bases downstream of the TSS. Most transcription factor binding sites are distributes upstream (5’) from TSS, and the assembly of transcription machinery occurs up to 1000 bases 5’ from TSS. Given our interest in SNPs that can potentially disrupt transcription factor binding, this defined promoter length allowed us to capture such SNPs in our analyses.

      “How did the authors define a contact region?”

      Chromatin contact regions identified by Hi-C or Capture-C assays are always reported as pairs of chromatin regions. The Supplementary eMethods provide details on the method of processing and interaction calling from the Hi-C and Capture-C data.

      “The manuscript would benefit from a detailed explanation of the methods used to define cREs, particularly the process of intersecting OCRs with chromatin conformation data. The current description does not fully clarify how the cREs are defined.”

      “In the result section titled "Consistency and diversity of childhood obesity proxy variants mapped to cREs", the authors introduced the different types of cREs in the context of open chromatin regions and chromatin contact regions, and TSS. Figure 2A is helpful in some way, but more explanation is definitely needed. For example, it seems that the authors introduced three chromatin contacts on purpose, but I did not quite get the overall motivation.”

      We apologize for the confusion. Our definition of cREs is consistent throughout the study. Figure 2A will be the first Figure 1A in the revision in order to aid the reader.

      The 3 representative chromatin loops illustrate different ways the chromatin contact regions (pairs of blue regions under blue arcs) can overlap with OCRs (yellow regions under yellow triangles – ATAC peaks) and gene promoters.

      [1] The first chromatin loop has one contact region that overlaps with OCRs at one end and with the gene promoter at the other. This satisfies the formation of cREs; thus, the area under the yellow ATAC-peak triangle is green.

      [2] The second loop only overlapped with OCR at one end, and there was no gene promoter nearby, so it is unqualified as cREs formation.

      [3] The third chromatin loop has OCR and promoter overlapping at one end. We defined this as a special cRE formation; thus, the area under the yellow ATAC-peak triangle is green.

      To avoid further confusion for the reader, we will eliminate this variation in the new illustration for the revised manuscript.

      “Figure 2A: The authors used triangles filled differently to denote different types of cREs but I wonder what the height of the triangles implies. Please specify.”

      The triangles are illustrations for ATAC-seq peaks, and the yellow chromatin regions under them are OCRs. The different heights of ATAC-seq peaks are usually quantified as intensity values for OCRs. However, in our study, when an ATAC-seq peak passed the significance threshold from the data pipeline, we only considered their locations, regardless of their intensities. To avoid further confusion for the reader, we will eliminate this variation in the new illustration for the revised manuscript.

      “Figure 1B-c. the title should be "OCRs at putative cREs". Similarly in Figure 1B-d.”

      cREs are a subset of OCRs.

      - In the section "Cell type specific partitioned heritability", the authors used "4 defined sets of input genomic regions". Are you corresponding to the four types of regions in Figure 2A? 

      Figure 2A will be the first Figure 1A in the revision and will be modified to showcase how we define OCRs and cREs.

      “It seems that the authors described the 771 proxies in "Genetic loci included in variant-to-genes mapping" (ln 154), and then somehow narrowed down from 771 to 94 (according to ln 199) because they are cREs. It would be great if the authors could describe the selection procedure together, rather than isolated, which made it quite difficult to understand.”

      In the Methods section entitled “Genetic loci included in variant-to-genes mapping," we described the process of LD expansion to include 771 proxies from 19 sentinel obesity-significantly associated signals. Not all of these proxies are located within our defined cREs. Figure 2B, now Figure 2A in the revision, illustrates different proportions of these proxies located within different types of regions, reducing the proxy list to 94 located within our defined cREs.

      “Figure 2. What's the difference between the 771 and 758 proxies? “

      13 out of 771 proxies did not fall within any defined regions. The remaining 758 were located within contact regions of at least one cell type regardless of chromatin state.

      (3) Typos

      “In the paragraph "Childhood obesity GWAS summary statistics", the authors may want to describe the case/control numbers in two stages differently. "in stage 1" and "921 cases" together made me think "1,921" is one number.”

      This will be amended in the revision.

      “Hi-C technology should be spelled as Hi-C. There are many places, it is miss-spelled as "hi-C". In Figure 1, the author used "hiC" in the legend. Similarly, Capture-C sometime was spelled as "capture-C" in the manuscript.”

      “At the end of the fifth row in the second paragraph of the Introduction section: "exisit" should be "exist".

      “In Figure 2A: "Within open chromatin contract region" should be "Within open chromatin contact region". 

      These typos and terminology inconsistencies will be amended in the revision.

    1. Author response:

      Provisional author response to Reviewer #1<br /> We would like the reviewer for his/her careful evaluation of our manuscript and appreciate his/her appraisal for the strengths of our study. Regarding the weaknesses, we plan to address these as good as possible during the revision of our manuscript.<br /> We can already state that miR-26b has clear anti-inflammatory effects on human liver slices, which is in line with our results demonstrating that miR-26b plays a protective role in MASH development in mice. The notion that patients with liver cirrhosis have increasing plasma levels of miR-26b, seems contradictory at first glance. However, we believe that this increased miR-26b expression is a compensatory mechanism to counteract the MASH/cirrhotic effects. However, the exact source of this miR-26b remains to be elucidated in future studies.<br /> The performed kinase activity analysis revealed that miR-26b affects kinases that particularly play an important role in inflammation and angiogenesis. Strikingly and supporting these data, these effects could be inverted again by LNP treatment. Combined, these results already provide strong mechanistic insights on molecular and intracellular signalling level. Although the exact target of miR-26b remains elusive and its identification is probably beyond the scope of the current manuscript due to its complexity, we believe that the kinase activity results already provide a solid mechanistic basis.

      Provisional author response to Reviewer #2<br /> We would like the reviewer for his/her careful evaluation of our manuscript and appreciate his/her appraisal for the strengths of our study. Regarding the weaknesses, we plan to address these as good as possible during the revision of our manuscript. Particularly the validation suggestions are very valuable and we plan to address these in the revision by performing additional experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Komarova et al. investigate the clinical prognostic ability of cell-level metabolic heterogeneity quantified via the fluorescence lifetime characteristics of NAD(P)H. Fluorescence lifetime imaging microscopy (FLIM) has been studied as a minimally invasive approach to measure cellular metabolism in live cell cultures, organoids, and animal models. Its clinical translation is spearheaded through macroscopic implementation approaches that are capable of large sampling areas and enable access to otherwise constrained spaces but lack cellular resolution for a one-to-one transition with traditional microscopy approaches, making the interpretation of the results a complicated task. The merit of this study primarily lies in its design by analyzing with the same instrumentation and approach colorectal samples in different research scenarios, namely in vitro cells, in vivo animal xenografts, and tumor tissue from human patients. These conform to a valuable dataset to explore the translational interpretation hurdles with samples of increasing levels of complexity. For human samples, the study specifically investigates the prediction ability of NAD(P)H fluorescence metrics for the binary classification of tumors of low and advanced stage, with and without metastasis, and low and high grade. They find that NAD(P)H fluorescence properties have a strong potential to distinguish between high- and low-grade tumors and a moderate ability to distinguish advanced-stage tumors from low-stage tumors. This study provides valuable results contributing to the deployment of minimally invasive optical imaging techniques to quantify tumor properties and potentially migrate into tools for human tumor characterization and clinical diagnosis.

      Strengths:

      The investigation of colorectal samples under multiple imaging scenarios with the same instrument and approach conforms to a valuable dataset that can facilitate the interpretation of results across the spectrum of sample complexity.

      The manuscript provides a strong discussion reviewing studies that investigated cellular metabolism with FLIM and the metabolic heterogeneity of colorectal cancer in general.

      The authors do a thorough acknowledgement of the experimental limitations of investigating human samples ex vivo, and the analytical limitation of manual segmentation, for which they provide a path forward for higher throughput analysis.

      Weaknesses:

      To substantiate the changes in fluorescence properties at the examined wavelength range (associated with NAD(P)H fluorescence) in relationship to metabolism, the study would strongly benefit from additional quantification of metabolic-associated metrics using currently established standard methods. This is especially interesting when discussing heterogeneity, which is presumably high within and between patients with colorectal cancer, and could help explain the particularities of each sample leading to a more in-depth analysis of the acquired valuable dataset.

      In order to address this issue, we have performed immunohistochemical staining of the available tumor samples for the two standard metabolic markers GLUT3 and LDHA.

      The results are included in Supplementary (Fig.S4). Discussion has been extended.

      Additionally, NAD(P)H fluorescence does not provide a complete picture of the cell/tissue metabolic characteristics. Including, or discussing the implications of including fluorescence from flavins would comprise a more compelling dataset. These additional data would also enable the quantification of redox metrics, as briefly mentioned, which could positively contribute to the prognosis potential of metabolic heterogeneity.

      We agree with the Reviewer that fluorescence from flavins could be helpful to obtain more complete data on cellular metabolic states. However, we lack to detect sufficiently intensive emission from flavins in colorectal cancer cells and tissues. The paragraph about flavins was added in Discussion and representative images - in Supplementary Material (Figure S5).

      In the current form of the manuscript, there is a diluted interpretation and discussion of the results obtained from the random forest and SHAP analysis regarding the ability of the FLIM parameters to predict clinicopathological outcomes. This is, not only the main point the authors are trying to convey given the title and the stated goals, but also a novel result given the scarce availability of these type of data, which could have a remarkable impact on colorectal cancer in situ diagnosis and therapy monitoring. These data merit a more in-depth analysis of the different factors involved. In this context, the authors should clarify how is the "trend of association" quantified (lines 194 and 199).

      We thank the Reviewer for this suggestion. The section has been updated with SHAP analysis using different parameters (dispersion D of t2, a1, tm and bimodality index BI of t2, a1, tm). It is now more clear that D-a1 is more strongly associated with clinicopathological outcomes compared with other variables. We have also added some biological interpretation of these results in the Discussion.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Metabolic heterogeneity of colorectal cancer as a prognostic factor: insights gained from fluorescence lifetime imaging" by Komarova et al., the authors used fluorescence lifetime imaging and quantitative analysis to assess the metabolic heterogeneity of colorectal cancer. Generally, this work is logically well-designed, including in vitro and in vivo animal models and ex vivo patient samples. However, since the key parameter presented in this study, the BI index, is already published in a previous paper by this group (Shirshin et al., 2022), and the quantification method of metabolic heterogeneity has already been well (and even better) described in previous studies (such as the one by Heaster et al., 2019), the novelty of this study is doubted. Moreover, I am afraid that the way of data analysis and presentation in this study is not well done, which will be mentioned in detail in the following sections.

      Strengths:

      (1) Solid experiments are performed and well-organized, including in vitro and in vivo animal models and ex vivo patient samples.

      (2) Attempt and efforts to build the association between the metabolic heterogeneity and prognosis for colorectal cancer.

      Weaknesses:

      (1) The human sample number (from 21 patients) is very limited. I wonder how the limited patient number could lead to reliable diagnosis and prognosis;.

      Additional 8 samples of patients’ tumors collected while the manuscript was under review were added to the present data. We agree that the number is still limited to conclude about the prognostic value of cell-level metabolic heterogeneity. But at this point we can expect that this parameter will become a metric for prognosis. We will continue this study to collect more samples of colorectal tumors and expand the approach to different cancer types.

      (2) The BI index or similar optical metrics have been well established by this and other groups; therefore, the novelty of this study is doubted.

      The purpose of this research was to quantify and compare the cellular metabolic heterogeneity across the systems of different complexity - commercial cell lines, tumor xenografts and patients’ tumors - using previously established FLIM-based metrics. For the first time, using FLIM, it was shown that heterogeneity of patients’ samples is much higher than of laboratory models and that it has associations with clinical characteristics of the tumors - the stage and the grade. In addition, this study provides evidence that bimodality (BI) in the distribution of metabolic features in the cell population is less important than the width of the spread (the dispersion value D).

      Some corrections have been made in the text on this point.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The following comments should be addressed to strengthen the rigor and clarity of the manuscript.

      (1) The ethical committee that approved the human studies should also be mentioned in the methods section, as was done with the animal studies.

      Information about the ethics committee has been added in the Manuscript.

      The study with the use of patients’ material was approved by the ethics committee of the Privolzhsky Research Medical University (approval № 09 from 30.06.2023).

      (2) The captions in Figures 2 and 3 must be revised. In Figure 2, it seems the last 2 sentences for the description of (C) do not belong there, and instead, the last sentence in the description of (D) may need to be included in (C) instead. Figure 3 is similar.

      The captions were revised.

      (3) From supplement Figure S2 it seems that EpCam and vimentin staining were only done in two of the mouse tumor types. No further mention is made in the results or methods section. Is there any reason this was not performed in the other tumor types? Were the histology and IHC protocols the same for the mouse and human tumors?

      The data on other tumor types and patients’ tumors have been added in Figure S3. Discussion was extended with the following paragraph.

      One of the possible reasons for metabolic heterogeneity could be the presence of stromal cells or diversity of epithelial and mesenchymal phenotypes of cancer cells within a tumor. Immunohistochemical staining of tumors for EpCam (epithelial marker) and vimentin (mesenchymal marker) showed that the fraction of epithelial, EpCam-positive, cells was more than 90% in tumor xenografts and on average 76±10 % in patients’ tumors (Figure S3). However, the ratio of EpCam- to vimentin-positive cells in patients’ samples neither correlated with D-a1 nor with BI-a1, which means that the presence of cells with mesenchymal phenotype did not contribute to metabolic heterogeneity of tumors identified by NAD(P)H FLIM.

      (4) Clarify the design of the experiments: The results come from 50 - 200 cells in each sample (except 30 in the CaCo2 cell culture) that were counted from 5 - 10 images acquired from each sample. There were 21 independent human samples. How many independent samples were included in the cell culture experiments and the mouse tumor models? Why is there an order of magnitude fewer cells included in the CaCo2 group compared to the other groups (Figure 1)? From the image (Figure 1A - CaCo2), it seems to be a highly populated type of sample, yet only 30 cells were quantified. What prevents the inclusion of the same number of cells to be quantified in each group for a more systematic evaluation?

      We thank the Reviewer for this comment.

      Cell culture experiments included two independent replicates for each cell line, the data from which were then combined. In animal experiments measurements were made in three mice (numbered 1-3 in Figure 2C) for each tumor type. We have made calculations for additional >100 cells of CaCo2 cell line. In the revised version the number of Caco2 cells is 146.

      The text of the Manuscript was revised accordingly.

      (5) Regarding references: Some claims throughout the text would benefit from an additional reference. For example: line 70 "Metabolic heterogeneity [...] is believed to have prognostic value"; line 121 " [...] the uniformity of cell metabolism in a culture, which is consistent with the general view on standard cell lines [...]". The clinical translational aspect (i.e., paragraph in line 255) warrants the inclusion of the efforts already done with FLIM imaging in the clinical setting both in vivo and ex vivo with point-spectroscopy and macroscopy imaging (e.g., Jo Lab, Marcu Lab, French Lab, and earlier work by Mycek and Richards-Kortum in colorectal cancer to name a few).

      Additional references were added.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the Introduction, line 85, the authors mention that "Specifically, the unbound state of NAD(P)H has a short lifetime (~0.4 ns) and is associated with glycolysis, while the protein-bound state has a long lifetime (~1.7-3.0 ns) and is associated with OXPHOS". I do not think this claim is appropriate. One cannot simply say that the unbound state is associated with glycolysis, nor that the bound state is associated with OXPHOS; both unbound and bound state are associated with almost all the metabolic pathways. Instead, the expression of "glycolytic/ OXPHOS shift", as authors used in other sections of this manuscript, is a more appropriate one in this case.

      The text of the Introduction was revised.

      (2) What are the biological implications of the bimodality index (BI)? Please provide specific insights.

      Bimodal distribution indicates there are two separate and independent peaks in the population data. In the metabolic FLIM data, this indicates that there are two sub-populations of cells with different metabolic phenotypes. Previously, we have observed bimodal distribution in the population of chemotherapy treated cancer cells, where one sub-population was responsive (shifted metabolism) and the second - non-responsive (unchanged metabolism) [Shirshin et al., PNAS, 2022]. In the naive tumor, a number of factors have an impact on cellular metabolism, including genetics features and microenvironment, so it is difficult to determine which ones resulted in bimodality. Our data on correlation of bimodality (BI) with clinical characteristics of the tumors show that there are no associations between them. What really matters is the width of the parameter spread in the population. The early-stage tumors (T1, T2) were metabolically more heterogeneous than the late-stage ones (T3, T4). A degree of heterogeneity was also associated with differentiation state, a stage-independent prognostic factor in colorectal cancer where the lower grade correlates with better the prognosis. The early-stage tumors (T1, T2) and high-grade (G3) tumors had significantly higher dispersion of NAD(P)H-a1, compared with the late-stage (T3, T4) and low-grade ones (G1, G2). From the point of view of biological significance of heterogeneity, this means that in stressful and unfavorable conditions, to which the tumor cells are exposed, the spread of the parameter distribution in the population rather than the presence of several distinct clusters (modes) matters for adaptation and survival. The high diversity of cellular metabolic phenotypes provided the survival advantage, and so was observed in more aggressive (undifferentiated or poorly differentiated) and the least advanced tumors.

      The discussion has been expanded on this account.

      (3) Have you run statistics in Figure 1B? If yes, do you find any significance? The same question also applies to Figures 2C and 3C.

      We performed statistical analysis to compare different cell lines in in vitro and in vivo models, the results obtained are presented in Table S4.

      (4) Line 119, why is the BI threshold set at 1.1?

      When setting the BI threshold at 1.1, we relied on the work by Wang et al, Cancer Informatics, 2009. The authors recommended the 1.1 cutoff as more reliable to select bimodally expressed genes. Further, we validated this BI threshold to identify chemotherapy responsive and non-responsive sub-populations of cancer cells (Shirshin et al. PNAS, 2022)

      (5) Line 123, what does the high BI of mean lifetime stand for? Please provide biological implications and insights.

      The sentence was removed because inclusion of additional CaCo2 cells (n=146) for quantification NAD(P)H FLIM data showed no bimodality in this cell culture.

      (6) In the legend for Figure 2C, the authors mention that "the bimodality index (BI-a1) is shown above each box"; however, I do not see such values. It is also true for Figure 3C.

      The legends for Fig. 2 and 3 were corrected.

      (7) In Figure 2, t1-t3 were not explained and mentioned in the main text. What do they mean? Do they mean different time points or different tumors?

      t1-t3 means different tumors in a group. Changes have been made to the figure - individual tumors are indicated by numbers.

      (8) In Figure 3, what do p13, p15 and p16 mean? It is not clearly explained. If they just represent patients numbered 13, 15, and 16, then why are these patients chosen as representatives? Do they represent different stages or are they just chosen randomly?

      Figure 3 was revised. Representative images were changed and a short description for each representative sample was included. In the revised version, representatives have been selected to show different stages and grades.

      (9) In Figure 3, instead of showing the results for each patient, I would suggest that authors show representative results from tumors at different stages; or, at least, clearly indicate the specific information for each patient. I do not think that providing the patient number only without any patient-specific information is helpful.

      Figure 3 was revised.

      (10) The sample number (21 patients) is very limited. I wonder how the limited patient number could lead to reliable diagnosis and prognosis.

      Additional eight samples were added. The text, figures and tables were revised accordingly.

      (11) In Discussion, it would be helpful to compare the BI index used in this study with the previously developed OMI-index (Line 275).

      We believe that BI index and OMI index describe different things and, therefore, it is hard to compare them. While BI index is used to describe the degree of the metabolic heterogeneity, OMI index is an integral parameter that includes redox ratio, mean fluorescence lifetimes of NAD(P)H and FAD, and rather indicates the metabolic state of a cell. In this sense it is more relevant to compare it with conventional redox ratio or Fluorescence Lifetime Redox Ratio (FLIRR) (H. Wallrabe et al., Segmented cell analyses to measure redox states of autofluorescent NAD(P)H, FAD & Trp in cancer cells by FLIM, Sci. Rep. 2018; 8: 79). The assessment of the heterogeneity of the FLIM parameters has been previously reported using the weighted heterogeneity (wH) index (Amy T. Shah et al, In Vivo Autofluorescence Imaging of Tumor Heterogeneity in Response to Treatment, Neoplasia 17, pp. 862–870 (2015). To the best of our knowledge, this is the only metric to quantify metabolic heterogeneity on the basis of FLIM data for today. A comparison of BI with the wH-index showed that the value of wH-index provides results similar to BI in the heterogeneity evaluation as demonstrated in our earlier paper (E.A. Shirshin et al, Label-free sensing of cells with fluorescence lifetime imaging: The quest for metabolic heterogeneity, PNAS 119 (9) e2118241119 (2022).  Yet, the BI provides dimensionless estimation on the inherent heterogeneity of a sample, and therefore it can be used to compare heterogeneity assessed by different decay parameters and FLIM data analysis methods. The limitation of using the OMI index for FLIM data analysis is the low intensity of the FAD signal, which was the case in our experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      We would like to see the major conclusions constrained to better fit the data presented in the manuscript. Speed is only a single performance metric of a very complicated, very diverse system of locomotion.

      If the authors would like to maintain the broader conclusions, the study should be repeated with a number of different performance metrics to shore up the manuscript's results. Particularly with efficiency, speed is not a reliable measure of efficiency to begin with, so this needs to be explored in a more targeted and appropriate manner.

      We agree with Reviewer 1 that we should be more precise about the fitness metrics used and more constrained about the conclusions. Considering the points raised in each paragraph, we’ve modified the text as follows:

      - [line 17] “... to test the necessity of both traits for sustained and effective displacement on the ground.”

      - [starting on line 105] “We generate the robot’s sample using an artificial evolutionary process that selects for better locomotion ability - defined as higher average speed as it is a proxy for organisms with sustained and effective displacement.”

      - [starting on line 287] “We also found that different gravitational environments require different shape structures to optimize locomotion average speed.”

      - [starting on line 311] “This consistency is evidence that a small number of sparsely connected modules is a morphological computation principle for an organism’s optimized average speed.”

      - [starting on line 348] “Beyond that, extending the tests for other important aspects of locomotion behavior - as noise on the ground, energetic costs, and maneuverability - by using other locomotion metrics - as energy efficiency, stability margin, and dissipated power (Paez and Melo, 2014; Aoi et al., 2016 ) - would also be relevant to evaluate the principle’s robustness.”

      - [starting on line 524] “As the robots with the highest average speed are the ones that succeed in maximizing displacement and having robust dynamics (they will not tumble with time), we defined $\bar s$ as the fitness value using it as a proxy of successful directed locomotion. Selecting for bodies that maximize speed is a common locomotion bias in natural selection, as both predators and prey and thus fecundity and mortality depend on it (Alexander, 2006). Other measures - such as energy efficiency - can capture distinct important aspects of the locomotion complexity (Paez and Melo, 2014) and would be worthy of investigating in future work.”

      Paper Premise/Mission Statement: As defined in the abstract and also called out in the text starting on line 59 is "investigate whether symmetry and modularity are features of an organism's shape need [authors italics] to have for better-directed locomotion..."

      If we understood correctly the reviewer is asking for more precision in the statement. We modified the respective sentence in the following way:

      - [line 62] “... need to have for optimizing average speed on the ground,”

      Reviewer #2 (Recommendations For The Authors):

      i) a lot of details that are in the captions should be moved in the main text;

      Thank you for this comment. We reviewed all the captions and text making modifications to ensure that all the information in the captions is also present in the main text. Below, we highlighted some of the changes:

      - [line 57] “Thus, locomotion on the ground is present in phylogenetically distant species (such as the maned wolf and frogfish in Figure 1A) and depends upon … “

      - [starting on line 64] “Figure 1B shows a schematic representation of symmetry and modularity on the maned wolf and frogfish bodies.”

      - [starting on line 277] “There is a negative correlation between the proportion of feet voxels and the robot’s locomotion transference capability when the robots go to an environment with higher gravity, i.e., water to mars (dark blue in Figure 5C), water to earth (light blue), and mars to earth (red) - with a Spearman correlation coefficients of r = -0.39, r = -0.43, and r = -0.32, respectively, all with p < 1e-08.”

      ii) hypotheses should be spelled out more clearly;

      We verified the experiments and certified that every experiment had a clear hypothesis statement in the original manuscript. Before each section defining the hypothesis and describing the experiment, we added the following statement:

      - [starting on line 119] “ With this sample, we tested the hypotheses about the relationships between locomotion performance and body modularity and symmetry (Figure 1I).”

      iii) performance metrics and other features should be better defined using mathematical terms if possible (for example, instability);

      Thank you for the comment. We added a definition for instability in the text:

      - [starting on line 218] “Nonetheless, locomotion requires a minimum instability - the dynamic possibility of translating the center of mass - in the direction axis to generate the necessary forward displacement (Bruijn et al., 2013; Nagarkar et al., 2021).”

      Despite the different definitions of instability in literature (Bruijn et al., 2013, Paez and Melo, 2014; Aoi et al., 2016, Nagarkar et al., 2021), we didn’t find one mathematical definition that fits perfectly in our context.

      Following the reviewer's comment, when necessary we expanded the definition for other features:

      - [starting on line 199] “... the distribution of body weight. As the robots do not have sensory feedback abilities, the weight balance is defined as the body’s movement due to gravity forces (consequences of the weight distribution and surface contact points) (Benda et al., 1994). We hypothesized that the robots with the best directed locomotion ability would tend to have a symmetric body shape. A robot with a low XY shape symmetry (XY shape symmetry < 0.5) has a higher chance of having a poor weight balance, increasing the chance of the body tipping over, thus leading it to a lousy locomotion performance (blue dotted line in Figure 3C). “

      iv)  more details regarding the simulations should be included;

      We thank the reviewer for this comment. If we understood correctly the Reviewer 2 is asking for more details regarding: “a) the adequacy of the spatial resolution, whereby I failed to see a compelling argument regarding the completeness of 64 voxels; b) the realism of the oscillatory patterns, whereby all the voxels are set to oscillate at the same, constant, frequency of 2Hz; and c) the accuracy of simulations in water where added mass effects seem to be neglected.”. We modified the text to better satisfy these concern:

      a) [starting on line 96] “We choose to first explore exhaustively the $4^3$ space dimension, as it is the minimal possible space that allows meaningful body plans. We also did control experiments within 6^3 and 8^3 to check for dimension size effects.”

      - [starting on line 432] “We did control experiments with robots within 6³ and 8³ dimensions to check for dimension size effects - and we found that the results found in 4³ remained valid. We choose to focus our analysis in the 4³ design space because we consider it the minimum coarse-grain to approach the biological question about the contingency of shape outcomes pressured for locomotion. Smaller spaces do not allow sufficient complexity in the body structures, and increasing spatial resolution reduces the extensiveness of the investigated search space.”

      b) [starting on line 451] “… we used a fixed oscillation frequency of 𝑓 = 2 Hz (Kriegman et al.,2020). A fixed frequency value reduces the number of degrees of freedom in the search for solutions, but in return, it narrows the direct connection between the simulated organisms and animals. Exploring different frequency values in future work would be important to investigate the impact of varied oscillatory frequencies in the shape solutions for directed locomotion.”

      c) The environment we call “water” is not an accurate modeling of aquatic habitats as we didn’t simulate essential forces such as draff effects. This choice is explained in text starting on line 110: “In the water-like environment the bodies have nullifying body weight but do not have drag effects. We did not add drag in our simulations because our aim is to study just the body weight influences in locomotion independently of other forces.”

      v) a full paragraph about limitations should be included in the discussions, focusing on both simulation aspects (for example, the use of simple spring elements in the voxels) and theoretical assumptions (for example, addressing the potential role of non-locomotion-related aspects).

      We thank the reviewer for the comment. We edited some paragraphs of the discussion section to make more explicit some limitations of our work:

      [starting on line 398] “We expect that including other important aspects of an animal's body as a developmental process and sensory functions could influence the shape's outcomes with other layers of principles. Although we based our simulations on an already successful transference of \textit{in silico} behavior to organisms made of biological tissue

      \citep{kriegman_scalable_2020}, there is an intrinsic gap between spring-mass robots modeling and animal’s bodies that is worthy of exploring to ensure the generality of our results. Other methods, such as the inclusion of rigid body elements in the simulation (possible in Voxelyze), the use of finite element modeling (FEM) (Coevoet et al., 2019), and the construction of physical robots (Aguilar et al., 2016), are important complements to this work. Beyond that, principles on other scales as in the genotypes (Johnston et al., 2022) and in other behavioral phenotypes (Gomez-Marin et al., 2016) could also be investigated.”

      To address the potential role of non-locomotion-related aspects, we revised the section

      “Discussion - Contingency of evolutionary outcomes” where we discussed other functional and biological roles:

      [starting on line 354 ] “Here we investigate how a specific functional cause - optimization of average speed during directed locomotion on the ground - externally defines the phenotypic space of shape possibilities.”

      [starting on line 359] “For simplification purposes, we choose to not explicitly control other important factors of locomotion (i.e., energy consumption, maneuverability) that nonlinearly interact during locomotion. In future studies, it would be important to conduct similar studies on a wider range of factors to study the shape and dynamic principles in different conditions.“

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors developed an extension to the pairwise sequentially Markov coalescent model that allows to simultaneously analyze multiple types of polymorphism data. In this paper, they focus on SNPs and DNA methylation data. Since methylation markers mutate at a much faster rate than SNPs, this potentially gives the method better power to infer size history in the recent past. Additionally, they explored a model where there are both local and regional epimutational processes. Integrating additional types of heritable markers into SMC is a nice idea which I like in principle. However, a major caveat to this approach seems to be a strong dependence on knowing the epimutation rate. In Fig. 6 it is seen that, when the epimutation rate is known, inferences do indeed look better; but this is not necessarily true when the rate is not known. (See also major comment #1 below about the interpretation of these plots.) A roughly similar pattern emerges in Supp. Figs. 4-7; in general, results when the rates have to be estimated don't seem that much better than when focusing on SNPs alone. This carries over to the real data analysis too: the interpretation in Fig. 7 appears to hinge on whether the rates are known or estimated, and the estimated rates differ by a large amount from earlier published ones.

      Overall, this is an interesting research direction, and I think the method may hold more promise as we get more and better epigenetic data, and in particular better knowledge of the epigenetic mutational process. At the same time, I would be careful about placing too much emphasis on new findings that emerge solely by switching to SNP+SMP analysis.

      Major comments:

      - For all of the simulated demographic inference results, only plots are presented. This allows for qualitative but not quantitative comparisons to be made across different methods. It is not easy to tell which result is actually better. For example, in Supp. Fig. 5, eSMC2 seems slightly better in the ancient past, and times the trough more effectively, while SMCm seems a bit better in the very recent past. For a more rigorous approach, it would be useful to have accompanying tables that measure e.g. mean-squared error (along with confidence intervals) for each of the different scenarios, similar to what is already done in Tables 1 and 2 for estimating $r$.

      We believe this comment was addressed in the previous revision (Sup Table 6-10) by adding Root Mean Square Errors for the demographic estimates (and RMSE for recent versus past portions of the demography). 

      - 434: The discussion downplays the really odd result that inputting the true value of the mutation rate, in some cases, produces much worse estimates than when they are learned from data (SFig. 6)! I can't think of any reason why this should happen other than some sort of mathematical error or software bug. I strongly encourage the authors to pin down the cause of this puzzling behaviour. (Comment addressed in revision. Still, I find the explanation added at 449ff to be somewhat puzzling -- shouldn't the results of the regional HMM scan only improve if the true mutation rate is given?)

      We do understand that our results and explanation can appear counter-intuitive. As acknowledged by the reviewer, in the previous round of revision we have at length clarified this puzzling behaviour by the discrepancy in assessing methylation regions using the HMM method which then differs from the HMM for the SMC inference. We are happy to clarify further in response to the new question of reviewer 1:

      If the Reviewer #1 means the SNP mutations (e.g. A → T), knowing the true mutation rate does not help the HMM to recover the region level methylation status. 

      If the Reviewer #1 means the epimutations (whether it is the region, site or both), knowing the true epimutations rates could theoretically help the HMM to recover the region level methylation status. However, at present, our method does not leverage information from epimutation rates to infer the region level methylation status. As inferring the epimutations rates is one of the goals of this study in the SMC inference, and that region level methylation status is required to infer those rates, we suspect that using epimutations rates to infer the region level methylation status could be statistically inappropriate (generating some kind of circular estimations). Instead, our HMM uses only the proportion of methylated and unmethylated sites (estimated from the genome) to determine whether or not a region status is most-likely to be methylated or unmethylated. We now explicit this fact in the HMM for methylation region in the method section.

      We acknowledge that our HMM to infer region level methylation status could be improved, but this would be a complete project and study on its own (due to the underlying complexity of the finite site and the lack of a consensus model for epimutations at evolutionary time scale). We believe our HMM to have been the best compromise with what was known from methylation and our goals when the study was conducted, and future work is definitely worth conducting on the estimation of the methylation regions.

      - As noted at 580, all of the added power from integrating SMPs/DMRs should come from improved estimation of recent TMRCAs. So, another way to study how much improvement there is would be to look at the true vs. estimated/posterior TMRCAs. Although I agree that demographic inference is ultimately the most relevant task, comparing TMRCA inference would eliminate other sources of differences between the methods (different optimization schemes, algorithmic/numerical quirks, and so forth). This could be a useful addition, and may also give you more insight into why the augmented SMC methods do worse in some cases. (Comment addressed in revision via Supp. Table 7.).

      - A general remark on the derivations in Section 2 of the supplement: I checked these formulas as best I could. But a cleaner, less tedious way of calculating these probabilities would be to express the mutation processes as continuous time Markov chains. Then all that is needed is to specify the rate matrices; computing the emission probabilities needed for the SMC methods reduces to manipulating the results of some matrix exponentials. In fact, because the processes are noninteracting, the rate matrix decomposes into a Kronecker sum of the individual rate matrices for each process, which is very easy to code up. And this structure can be exploited when computing the matrix exponential, if speed is an issue.

      We believe this comment was acknowledged in the previous revision (line 649), and we thank the reviewer for this interesting insight.

      - Most (all?) of the SNP-only SMC methods allow for binning together consecutive observations to cut down on computation time. I did not see binning mentioned anywhere, did you consider it? If the method really processes every site, how long does it take to run?

      We believe this comment was addressed in the previous revision and was added to the manuscript in the methods Section (subsection :  SMC optimization function).

      - 486: The assumed site and region (de)methylation rates listed here are several OOM different from what your method estimated (Supp. Tables 5-6). Yet, on simulated data your method is usually correct to within an order of magnitude (Supp. Table 4). How are we to interpret this much larger difference between the published estimates and yours? If the published estimates are not reliable, doesn't that call into question your interpretation of the blue line in Fig. 7 at 533? (Comment addressed in revision.)

      Reviewer #2 (Public Review):

      A limitation in using SNPs to understand recent histories of genomes is their low mutation frequency. Tellier et al. explore the possibility of adding hypermutable markers to SNP based methods for better resolution over short time frames. In particular, they hypothesize that epimutations (CG methylation and demethylation) could provide a useful marker for this purpose. Individual CGs in Arabidopsis tends to be either close to 100% methylated or close to 0%, and are inherited stably enough across generations that they can be treated as genetic markers. Small regions containing multiple CGs can also be treated as genetic markers based on their cumulative methylation level. In this manuscript, Tellier et al develop computational methods to use CG methylation as a hypermutable genetic marker and test them on theoretical and real data sets. They do this both for individual CGs and small regions. My review is limited to the simple question of whether using CG methylation for this purpose makes sense at a conceptual level, not at the level of evaluating specific details of the methods. I have a small concern in that it is not clear that CG methylation measurements are nearly as binary in other plants and other eukaryotes as they are in Arabidopsis. However, I see no reason why the concept of this work is not conceptually sound. Especially in the future as new sequencing technologies provide both base calling and methylating calling capabilities, using CG methylation in addition to SNPs could become a useful and feasible tool for population genetics in situations where SNPs are insufficient.

      We thank again the reviewer #2 for his positive comments.  

      Reviewer #3 (Public Review):

      I very much like this approach and the idea of incorporating hypervariable markers. The method is intriguing, and the ability to e.g. estimate recombination rates, the size of DMRs, etc. is a really nice plus. I am not able to comment on the details of the statistical inference, but from what I can evaluate it seems reasonable and in principle the inclusion of highly mutable sties is a nice advance. This is an exciting new avenue for thinking about inference from genomic data. I remain a bit concerned about how well this will work in systems where much less is understood about methylation,

      The authors include some good caveats about applying this approach to other systems, but I think it would be helpful to empiricists outside of thaliana or perhaps mammalian systems to be given some indication of what to watch out for. In maize, for example, there is a nonbimodal distribution of CG methlyation (35% of sites are greater than 10% and less than 90%) but this may well be due to mapping issues. The authors solve many of the issues I had concerns with by using gene body methylation, but this is only briefly mentioned on line 659. I'm assuming the authors' hope is that this method will be widely used, and I think it worth providing some guidance to workers who might do so but who are not as familiar with these kind of data.

      We thank the reviewer #3 for his positive comments. And we agree with Reviewer #3 concerning the application to data and that our approach needs to be carefully thought before applied. Our results clearly show that methylation processes are not well enough understood to apply our approach as we initially (maybe naively) designed it. Further investigations need to be conducted and appropriate theoretical models need to be developed before reliable results can be obtained. And we hope that our discussion points this out. However, our approach, the theoretical models and the additional tools contained in this study can be used to help researchers in their investigations to whether or not use different genomic markers to build a common (potentially more reliable) ancestral history. We enhanced the discussion in this second revision by clarifying also the use of the methylation from genic regions to avoid  confusion (lines 700-731).

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      In added Supp. Table 7, I don't think these are in log10 units as stated in the caption.

      Well Spotted! Indeed, the RMSE is not in log10 scale, we corrected the caption. We also added that the TMRCA used for MRSE calculations is in generations units to avoid potential confusion.  

      Reviewer #3 (Recommendations for The Authors):

      I very much appreciate the authors' attention to previous questions. I would ask that a bit more is spent in the discussion on concerns/approaches empiricists should keep in mind -- I am wary of this being uncritically applied to data from non-model species. It was not clear to me, for example (only mentioned on line 659 in the discussion) that the thaliana data is only using gene-body methylation. This poses potential issues with background selection that the authors acknowledge appropriately, but also assuages many of my concerns about using genome-wide data. I think text with recommendations for data/filtering/etc or at least cautions of assumptions empiricists should be aware of would help.

      We apologize for the confusion at line 659. As written in the other section of the manuscript we meant CG sites in genic regions (and not only gene body methylated regions).

      Due to the manuscript’s structure, the data from Arabidopsis thaliana is only described at the very end of the manuscript (line 900+). However, a brief description could also be found line 291-296. We however added a sentence in the introduction (line 128) for clarity. 

      We however agree with the comment made by reviewer #3 concerning the application to data. We pointed in the discussion the risk of applying our approach on ill-understood (or illprepared) data and stressed the current need of studies on the epimutations processes at evolutionary time scale ( i.e. at Ne time scale) (line 700-703).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary:

      Clostridium thermocellum serves as a model for consolidated bioprocess (CBP) in lignocellulosic ethanol production, but yet faces limitations in solid contents and ethanol titers achieved by engineered strains thus far. The primary ethanol production pathway involves the enzyme aldehydealcohol dehydrogenase (AdhE), which forms long oligomeric structures known as spirosomes, previously characterized via the 3.5 Å resolution E. coli AdhE structure using single-particle cryoEM. The present study describes the cryo-EM structure of the C. thermocellum ortholog, sharing 62% sequence identity with E. coli AdhE, resolved at 3.28 Å resolution. Detailed comparative structural analysis, including the Vibrio cholerae AdhE structure, was conducted. Integrating cryoEM data with molecular dynamics simulations indicated that the aldehyde intermediate resides longer in the channel of the extended form, supporting the hypothesis that the extended spirosome represents the active form of AdhE. 

      Strengths: 

      The study conducts a comprehensive structural comparative analysis of oligomerization interfaces and the acetaldehyde channel across compact and extended conformations. Structural and computational results suggest the extended spirosome as the most likely active state of AdhE. 

      Weaknesses: 

      The overall resolution of the C. thermocellum structure is similar to the E. coli ortholog, which shares 62% sequence identity, and the oligomerization interfaces and the acetaldehyde channel were previously described. 

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Ziegler et al, entitled 'Structural characterization and dynamics of AdhE ultrastructure from Clostridium thermocellum: A containment strategy for toxic intermediates?" presents the atomic resolution cryo-EM structure of C. thermocellum AdhE showing that it show dominantly an extended form while E. coli AdhE shows dominantly a compact form. With comparative analysis of their C. thermocellum structure and the previous E. coli AdhE structure, they tried to reveal the mechanism by which C. thermocellum and E. coli show diXerent dominant conformations. In addition, they also analyzed the substrate channel by comparative and computational approaches. Lastly, their computational analysis using CryoDRGN reveals conformational heterogeneity in the sample. Although this manuscript suggests a potential mechanism of the diXerent features of AdhEs, this manuscript is very descriptive and does not provide suXicient data to support the authors' conclusions, which may be due to the lack of experimental data to support their findings from the computational analysis. 

      Strengths: 

      This manuscript provides the first C. thermocellum (Ct) AdhE structure and comparatively analyzed this structure with E. coli AdhE. 

      Weaknesses: 

      Their main conclusions obtained mostly by computational and comparative analysis are not supported by experimental data. 

      Reviewer #3 (Public Review): 

      This study describes the first structure of Gram-positive bacterial AdhE spirosomes that are in a native extended conformation. All the previous structures of AdhE spirosomes obtained come from Gram-negative bacterial species with native compact spirosomes (E. coli, V. cholerae). In E. coli, AdhE spirosomes can be found in two diXerent conformational states, compact and extended, depending on the substrates and cofactors they are bound to. 

      The high-resolution cryoEM structure of the extended C. thermocellum AdhE spirosomes produced in E. coli in an apo state (without any substrate or cofactors) is compared to the E. coli extended and compact AdhE spirosomes structures previously published. The authors have modeled (in Swiss-Model) the structure of compact C. thermocellum AdhE spirosomes, using E. coli compact AdhE spirosome conformation as a template, and performed molecular dynamics simulations. They have identified a channel in which the toxic reaction intermediate aldehyde could transit from the aldehyde dehydrogenase active site to the alcohol dehydrogenase active site, in an analogous manner to E. coli spirosomes. These findings are in line with the hypothesis that the extended spirosomes could correspond to the active form of the enzyme. 

      In this work, the authors speculate that the C. thermocellum AdhE spirosomes could switch from the native extended conformation to a compact conformation, in a way that is inverse of E. coli spirosomes. Although attractive, this hypothesis is not supported by the literature. Amazingly, in some Gram-positive bacterial species (S. pneumoniae, S. sanguinis or C. di8icile...), AdhE spirosomes are natively extended and have never been observed in a compact conformation. On the opposite, E. coli (and other Gram-negative bacteria) native AdhE spirosomes are compact and are able to switch to an extended conformation in the presence of the cofactors (NAD+, coA, and iron). The data presented as they are now are not convincing to confirm the existence of C. thermocellum AdhE spirosomes in a compact conformation. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Major points: 

      (1) The claim of achieving the highest resolution AdhE structure lacks strong support since the E. coli structure was solved at 3.5A, whereas the C. thermocellum was solved at 3.28A. Conducting a local resolution analysis could provide insights into distinct structural interpretations, enhancing the strength of the claim. 

      We have modified the sentence claiming this as the highest resolution AdhE structure to say, “In this study, we presented and analyzed a high-resolution structure of the AdhE spirosome from C. thermocellum.” We have included the local resolution map in Figure 2C – all structural analysis was performed in regions from the center of the molecule, where the highest resolution information was determined.

      (2) The comparative structural analysis of the oligomerization interface is thorough, yet it could benefit from greater conciseness. Focusing on highlighting major findings would streamline the presentation and enhance clarity. 

      We altered a few places in the comparative structural analysis in response to other reviewers. We also divided the main structure section into two subsections (spirosome interfaces and AdhE active sites) to enhance clarity.

      Reviewer #2 (Recommendations For The Authors): 

      (1) The authors should change the tile containing "?". Does it mean that the conclusions that the authors made are still in question? 

      We have removed the question mark to indicate that our results point to a channeling mechanism.

      (2) Figure 1B: Clarify Ct Fwd. Is this adding NADH, and Ct Rev adding NAD+? 

      This information is described in the text in lines 98-100. It is also at the bottom of figure 1B.

      (3) Line 131: Please revise accordingly for clarity: "The extended dimer interfaces" è "The extended E.coli dimer interface". 

      This has been edited for clarity. We have added the following sentence resulting to indicate which interfaces that are being discussed: “Both the E. coli and C. thermocellum extended dimer interfaces bury ~5000 Å2. While the compact C. thermocellum compact dimer interface buries a similar surface area of ~4800 Å2, the E. coli dimer interface buries ~3800 Å2.”

      (4) Line 133-136: Why that does not seem to be the case? These sentences are not clear what the authors exactly mean. 

      We altered the text to say, “One would expect the compact structure in E. coli to have a larger buried surface area due to it being the predominant form when it is examined without additives, but that is not the case; further corroborating that factors other than buried surface area must impact the apo state of the spirosome.” We hope this clarifies our intent.

      (5) Line 138-145: The authors should provide a logic for how the diXerent distribution of the charged residues would change the form of AdhE. It may just be a diXerent distribution nothing to do with the conformational change. 

      After further analysis of the interface amino acid distribution, we agree that the distribution may have nothing to do with the conformational change. We have changed this section to end with the sentence “Analysis of the residues buried in these interfaces reveals that while many of the residues are identical in the C. thermocellum and E. coli extended structures, there are some diXerences in amino acid type distribution, although nothing that directly indicates control of conformer state (Supplemental Figure 3).” 

      (6) Line 169: Kim et al. è Cho et al.

      We have corrected this error.

      (7) Line 122-235: The whole section is just describing the diXerence between Ct and Ec AdhE suggesting that this diXerence may contribute to the conformational diXerence without any evidence. The author cannot say that the diXerences in the interface, active sites cofactor pockets, etc explain why two AdhE (Ct, Ec) have diXerent domain conformers unless they provide experimental data. 

      We did not conclude that any diXerences we observed structurally were responsible for the conformation change. The purpose of this section was solely to compare the structures to determine if we could find a structural basis for the diXerence between E. coli and C. thermocellum conformation – we stated a few times throughout the section and in the discussion that there were no immediate structural reasons for this diXerence in shape. We have added a few sentences in the discussion to address whether Gram-positive vs. Gram-negative is influencing the shape, addressed in reviewer #3 comment #4. 

      (8) Line 237: The whole section "Identification..." analyzed the substrate channel by computational analysis. The author should provide experimental evidence that these residues identified are critical for channeling by generating mutants and measuring their activity. 

      We agree that mutagenesis is the next logical step for these results, however it is outside the scope of work of this paper as this study will not be that straightforward. We have included a sentence in the discussion to indicate our plans for further investigation to the channel that says, “Future mutagenesis studies will be needed to confirm whether the spirosome exists to control the reaction flux in high-reactant conditions.”

      Reviewer #3 (Recommendations For The Authors): 

      (1) The capacity of C. thermocellum AdhE spirosomes to switch from a natively extended conformation to a compact conformation is not demonstrated in this manuscript, as it is now. Because this would be the first time that Gram-positive bacterial AdhE spirosomes are observed in a compact conformation, the authors should provide a clear demonstration of their existence by presenting reliable and good images of C. thermocellum compact spirosomes. 

      We have modified Figure 1A to zoom in on one compact and extended spirosome that we have identified from each C. thermocellum sample. We have included triangles of the same size and shape to indicate the proximity of a turn of a helix, showing that the identified compact spirosomes have a tighter conformation than extended spirosomes.

      (2) The authors should show at least an image of the compact C. thermocellum spirosomes, that they claim to observe in the presence of NADH or in the forward reaction conditions mentioned in Figure 1. The authors have added diXerent reactants to the extended C. thermocellum spirosomes and visualized their conformation by negative stain. An image of each condition tested would be valuable and would nicely complete the distribution of compact versus extended spirosomes presented in Figure 1. 

      We have created a new supplemental figure with spirosomes circled for all of the experimental conditions for C. thermocellum (Supplemental figure 1). We have added a reference to supplemental figure 1 in the text to direct the reader to these images.

      (3) The cryoEM classes presented in Figure 8 are not convincing and could correspond to dimers or rosettes of AdhE or to E. coli endogenous AdhE. CryoEM classes showing longer compact C. thermocellum spirosomes should be shown. The percentage of these compact spirosomes visualized in the micrographs should be added and discussed in the text as it would increase confidence in these findings and confirm that C. thermocellum compact spirosomes exist. Heterologous production of C. thermocellum AdhE in E. coli depleted for its endogenous AdhE would be required to definitively prove that these are compact C. thermocellum AdhE spirosomes in the cryoEM. 

      We included the pictures of the theoretical compact spirosomes, as generated from the 8-mer of E. coli AdhE (6AHC) to address the possibility of rosettes. We have now indicated in the text that there were 6.7% of the particles in the compact conformation, which is less than seen by negative stain. We further mentioned that the compact spirosome is less compact than that seen in E. coli. We added a sentence to the discussion about the possibility of contaminating E. coli spirosomes (though this is very unlikely ) in our compact spirosome analysis: “While these compact spirosomes could result from expression in E. coli, though this is very unlikely, we also identified compact spirosomes in a native C. thermocellum lysate, which would not have similar contamination issues.”

      (4) The authors should include and discuss in the text previous findings (among which Laurenceau et al., 2015...) describing the diXerences between Gram-positive and Gram-negative spirosomes. AdhE spirosomes are natively extended in most Gram-positive bacterial species (S. pneumoniae, S. sanguinis or C. diXicile...), and have never been observed in a compact conformation. On the opposite, E. coli (and other Gram-negative bacteria) native AdhE spirosomes are compact and are able to switch to an extended conformation in the presence of the cofactors (NAD+, coA, and iron). 

      We have added the following sentences to the discussion to address this comment: “This could potentially be due to the diXerences between Gram-positive and Gram-negative bacteria. In previous studies, compact spirosomes have only been isolated from Gram-negatives while solely extended spirosomes have been isolated from Gram-positives. Furthermore, while the compact spirosomes can transition to extended in the presence of cofactors, the reverse has not been previously observed with an extended spirosome.”

      (5) The authors have spotted some diXerences between the E. coli and C. thermocellum structures, that they believe could explain the intrinsic capacity of these spirosomes to be natively extended or compact. It would be interesting to confirm this hypothesis by measuring C. thermocellum extended AdhE spirosome activity and comparing it to E. coli extended spirosomes. The impact of mutations in the regions proposed by the authors to be important in the capacity of C. thermocellum AdhE to be extended (especially the GxGxxG motif and the D494 position) would be appreciated to confirm this hypothesis. 

      We agree that this would be an interesting avenue of research although it is currently outside the scope of this paper. We are looking into experiments that we can perform where we can track both activity and conformation but have not found an ideal experiment at this time.

      (6) Many statements and result interpretations are overstated in several parts of the manuscript and would need to be rewritten to balance the absence of clear evidence of C. thermocellum compact spirosomes. 

      We have shown that we have identified compact spirosomes, addressed in multiple comments above. We have adjusted the language of the paper to indicate more uncertainty that will be followed up in future mutagenesis experiments. However, these mutations are not that simple to identify and this research would require a fairly large study that is better suited for a follow up manuscript.

      (7) The Figure 7 legend would need to be corrected.

      We are unsure as to what needs to be corrected in the figure 7 legend based on this comment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      (1) In my assessment, the data sufficiently demonstrates that a modified version of Pertuzamab can bind both the wild-type and S310 mutant forms of ERBB2.

      (2) The engineering strategy employed is rational and effectively combines computational and experimental techniques.

      (3) Given the clinical activity of HER2-targeting ADCs, antibodies unaffected by ERBB2 mutations would be desired.

      Weaknesses:

      (1) There is no data showing that the engineered antibody is equally specific as Pertuzamab i.e. that it does not bind to other (non-ERBB2) proteins.

      Showing the specificity of the engineered antibodies is indeed important. We did not address it in the current ms, but it can be tested in the future.

      (2) There is no data showing that the engineered antibody has the desired pharmacokinetics/pharmacodynamics properties or efficacy in vivo.

      In this ms we did not conduct in-vivo experiments. When moving forward, pharmacokinetics/pharmacodynamics properties and efficacy will be tested as well.

      (3) Computational approaches are only used to design a phage-screen library, but not used to prioritize mutations that are likely to improve binding (e.g. based on predicted impact on the stability of the interaction). A demonstration of how computational pre-screening or lead optimization can improve the time-intensive process would be a welcome advance.

      Thank you for this important comment. In the present ms we indeed used a computational approach for prioritizing residues to be mutated, but we did not prioritize the mutations that are likely to improve binding. In the initial library design, we did prioritize the mutations. However, due to experimental approach limitations with codon’s selection for the library, we had decided to allow all possible residues in each position, knowing that the selection will remove non-binding variants.

      Context:

      The conflict of interest statement is inadequate. Most authors of the study (but not the first author) are employees of Biolojic, a company developing multi-specific antibodies, but the statements do not clarify whether the presented antibodies represent Biolojic IP, whether the company sponsored the research, and whether the company is further developing the specific antibodies presented.

      The Conflict-of-Interest statement will be revised as such: The Biolojic Design authors are employees of Biolojic Design and have stock options in Biolojic Design. The company did not sponsor the research, does not hold IP for the presented antibodies, and is not further developing the presented antibodies.

      Reviewer #2 (Public Review):

      Strengths:

      (1) Deep computational analyses of large datasets of clinical data provide useful information about HER2 mutations and their potential relevance to antibody therapy resistance.

      (2) There is valuable information analyzing the residues within or near the interface between the antigen HER2 and the Pertuzumab antibody (heavy chain). The experimental antibody library screening obtained 90+ clones from 3.86×1011 sequences for further functional validation.

      Weaknesses:

      (1) There is a lack of assessment for antibody variant functions in cancer cell phenotypes in vitro (proliferation, cell death, motility) or in vivo (tumor growth and animal survival). The only assay was the western blotting of phosphopho-HER3 in Figure 4. However, HER2 levels and phosphor-HER2 were not analyzed.

      We indeed did not assess the engineered antibodies function in cancer cells. While a complete signaling assessment obviously requires functional assessment as well, due to the complexity of this assay, papers in this field (for example [1-3]) measure the signaling activation following HER2-HER3 dimerization by measuring pHER3, and we relied on them in this ms.

      (2) There is a misleading impression from the title of computational engineering of a therapeutic antibody and the statement in the abstract "we designed a multi-specific version of Pertuzumab that retains original function while also bindings these HER2 variants" for a few reasons:

      a. The primary method used for variant antibody identification for HER2 mutant binding is rather traditional experimental screening based on yeast display instead of the computational design of a multi-specific version of Pertuzumab.

      b. There is insufficient or lack of computational power in the antibody design or prioritization in choosing variant residues for the library construction of 3.86×1011 sequences. It seems random combinations from 6 residues out of 4 groups with 20 amino acid options.

      c. The final version of the tri-binding variant is a combination of screened antibody clones instead of computation design from scratch.

      d. There is incomplete experimental evidence about the therapeutic values of newly obtained antibody clones.

      Thank you for this relevant comment. When addressing relevant residues to be mutated, the number of potential variants is enormous. The computational approach was aimed at identifying the most preferable residues, in which variation can improve binding and is not likely to harm important interactions. Although an initial smaller number of residues could be chosen, we decided to broaden our view and create a larger library, in the aim of combining the computational selection with an experimental selection. This indeed is not a computational design from scratch, but rather an intercourse between the computer and the lab, that yielded the presented results.

      (3) Figures can be improved with better labeling and organization. Some essential pieces of data such as Supplementary Figure 1B on HER2 mutations in S310 that abrogated its binding to Pertuzumab should be placed in the main figures.

      Thank you for this comment, the relevant figures were moved to the main text, and the labels were revised.

      (4) It is recommended to provide a clear rationale or flowchart overview into the main Figure 1. Figure 2A can be combined with Figure 1 to the list of targeted residues.

      Figures 1 and 2 were divided differently, and the rationale was moved to the main text.

      (5) The quality of Figures such as Figure 2B-C flow data needs to be improved.

      High-quality figures were submitted with the revised ms.

      Reviewer #1 (Recommendations for The Authors):

      Major:

      (1) It should be clarified whether the S310 somatic mutations represent resistance mutations to Pertuzamab (i.e. emerge post-therapy) or are general mutations that activate HER2. This is important because mutations that specifically "evade" the binding of an antibody may be substantially more difficult to overcome than mutations that only by chance occur in the antibody binding site. This concern should be addressed in the introduction and discussion as it changes the interpretation of the data.

      This is a very important note. To the best of our knowledge, these mutations were not identified as resistance mutations that emerged post-therapy. However, as mentioned in the introduction, these mutations form hydrophobic interactions that stabilize HER2 dimerization. Moreover, cells expressing these mutations show hyperphosphorylation of HER2 and an increase in the subsequent activation of signaling pathways. Thus, these mutations do not necessarily evade Pertuzumab binding, but benefit cancer growth. This point was clarified in the introduction of the revised text.

      (2) While the authors claim that S310 germline pathogenic variants exist, I could not find evidence that this is the case. The dbGAP ID does not provide any evidence (either in the form of a citation or prevalence). The variants do not exist in GnomAD. A recent article discussing pathogenic ERBB2 germline variants only mentions S310 as a somatic variant https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8268839/ and I could not find evidence for S310 being a germline variant in the references provided by the author (https://www.nature.com/articles/nbt.3391) - where it is only mentioned as a somatic mutation. I could not find evidence of a cancer predisposition syndrome associated with this variant.

      Thank you for highlighting this matter. We had assumed that the presence of the variant in dbSNP means it is also a germline mutations, what may not be correct. However, we did find some evidence of this mutation as germline in ClinVar, and this was edited in the revised ms. https://www.ncbi.nlm.nih.gov/clinvar/RCV001311879.7.

      (3) The authors should consider experiments that show that the modified Pertuzamab has the same mechanism of action as the original Pertuzamab in preventing dimerization of the ERBB2 homodimer and/or interactions with ERBB3. I cannot recommend a specific approach, but at present it is not clear whether the mechanism or just the effect (phosphorylation of ERBB3) is the same.

      As mentioned above, for the assessment of HER-HER3 binding and HER3 signaling, in this ms we relied on a previous works [1-3] that also measured the signaling activation following HER2-HER3 dimerization by measuring pHER3.

      (4) The authors should perform in vitro experiments to demonstrate that the engineered antibody has similar on-target specificity not only sensitivity. I don't know what the ideal experiments would be, but should probably probe native epitopes. Western blots, immunoprecipitation of cell lysates?

      As mentioned above, showing the specificity of the engineered antibodies is indeed important. We did not address it in the current ms, but it can be tested in future work.

      Minor:

      (1) The introduction should review better the literature on the computational/rational design of antibodies, especially multi-specific - and likely de-emphasize small molecules (and mutations associated with the resistance thereof) as the presented research does not inform the design of mutation-agnostic small molecules.

      Thank you for these comments, the introduction was revised accordingly.

      (2) The authors should better present the fact that the lack of binding of Pertuzamab to HER2 S310 was previously known, thus the whole strategy of searching COSMIC, and computationally predicting their binding impact was unnecessary. Rather it would be helpful to learn how many other COSMIC hotspots could have a similar effect on other clinical antibodies.

      The lack of binding was indeed previously known, as mentioned in the introduction. However, we did not start our analysis targeting HER2 specifically, but we rather found these mutations because they were located in the binding pocket, which enabled our strategy to compensate for these mutations with alteration of the original Pertuzumab. Regarding other potential hotspots, the numbers appeared in Supplementary Table 1, and were moved to the main text.

      Stylistic:

      (1) Avoid using the term "drug" for an antibody.

      The term was changed to “antibody therapeutics” in the revised text.

      (2) Avoid repetition in the introduction.

      Thank you, we revised the introduction with this comment in mind.

      Reviewer #2 (Recommendations For The Authors):

      The quality of Figure 2B-C flow data needs to be improved:

      a. The diagonal populations suggest inappropriate color compensation or indicate cells are derived from unhealthy populations.

      We believe there may be some confusion here. The figures you are referring to are figures of very diverse library. The selected clones show nice diagonals, as shown in Supplementary Figure 5.

      b. Additional round 3 and round 4 did not seem to improve the enrichment of targeted clones but rather had similar binding profiles to each of the three proteins over and over.

      Two sets of the fourth round of selection were done, each originated from a different sub-population in round 3: 1. Clones that bind the S310Y mutation 2. Clones that bind the S310F mutation. The aim of the R4 was to examine this binders against the second mutation and canonical HER2 in the search for multi-specificity. Additional clarification of this point will be added to the main text.

      c. Figure legends are vague with non-specific descriptions of cells and conditions, and unclear statements of "FACS results...".

      The legends were edited in the revised version.

      d. Text fonts are in low resolution.

      High-quality figures were submitted with the revised ms.

      (1) Diwanji, D., et al., Structures of the HER2-HER3-NRG1β complex reveal a dynamic dimer interface. Nature, 2021. 600(7888): p. 339-343.

      (2) Yamashita-Kashima, Y., et al., Mode of action of pertuzumab in combination with trastuzumab plus docetaxel therapy in a HER2-positive breast cancer xenograft model. Oncol Lett, 2017. 14(4): p. 4197-4205.

      (3) Kang, J.C., et al., Engineering multivalent antibodies to target heregulin-induced HER3 signaling in breast cancer cells. MAbs, 2014. 6(2): p. 340-53.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The development of effective computational methods for protein-ligand binding remains an outstanding challenge to the field of drug design. This impressive computational study combines a variety of structure prediction (AlphaFold2) and sampling (RAVE) tools to generate holo-like protein structures of three kinases (DDR1, Abl1, and Src kinases) for binding to type I and type II inhibitors. Of central importance to the work is the conformational state of the Asp-Phy-Gly "DFG motif" where the Asp points inward (DFG-in) in the active state and outward (DFG-out) in the inactive state. The kinases bind to type I or type II inhibitors when in the DFG-in or DFG-out states, respectively.

      It is noted that while AlphaFold2 can be effective in generating ligand-free apo protein structures, it is ineffective at generating holo-structures appropriate for ligand binding. Starting from the native apo structure, structural fluctuations are necessary to access holo-like structures appropriate for ligand binding. A variety of methods, including reduced multiple sequence alignment (rMSA), AF2-cluster, and AlphaFlow may be used to create decoy structures. However, those methods can be limited in the diversity of structures generated and lack a physics-based analysis of Boltzmann weight critical to their relative evaluation.

      To address this need, the authors combine AlphaFold2 with the Reweighted Autoencoded Variational Bayes for Enhanced Sampling (RAVE) method, to explore metastable states and create a Boltzmann ranking. With that variety of structures in hand, grid-based docking methods Glide and Induced-Fit Docking (IFD) were used to generate protein-ligand (kinase-inhibitor) complexes.

      The authors demonstrate that using AlphaFold2 alone, there is a failure to generate DFG-out structures needed for binding to type II inhibitors. By applying the AlphaFold2 with rMSA followed by RAVE (using short MD trajectories, SPIB-based collective variable analysis, and enhanced sampling using umbrella sampling), metastable DFG-out structures with Boltzmann weighting are generated enabling protein-ligand binding. Moreover, the authors found that the successful sampling of DFG-out states for one kinase (DDR1) could be used to model similar states for other proteins (Abl1 and Src kinase). The AF2RAVE approach is shown to result in a set of holo-like protein structures with a 50% rate of docking type II inhibitors.

      Overall, this is excellent work and a valuable contribution to the field that demonstrates the strengths and weaknesses of state-of-the-art computational methods for protein-ligand binding. The authors also suggest promising directions for future study, noting that potential enhancements in the workflow may result from the use of binding site prediction models and free energy perturbation calculations.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores the utility of AlphaFold2 (AF2) and the author's own AF2-RAVE method for drug discovery. As has been observed elsewhere, the predictive power of docking against AF2 structures is quite limited, particularly for proteins like kinases that have non-trivial conformational dynamics. However, using enhanced sampling methods like RAVE to explore beyond AF2 starting structures leads to a significant improvement.

      Strengths:

      This is a nice demonstration of the utility of the authors' previously published RAVE method.

      Weaknesses:

      My only concern is the authors' discussion of induced fit. I'm quite confident the structures discussed are present in the absence of ligand binding, consistent with conformational selection. It seems the author's own data also argues for an important role in conformational selection. It would be nice to acknowledge this instead of going along with the common practice in drug discovery of attributing any conformational changes to induced fit without thoughtful consideration of conformational selection.

      The reviewer is correct. We aim to highlight the significant role of conformational selection. To clarify this, we have expanded the discussion on conformational selection in the introduction.

      Reviewer #3 (Public Review):

      In this manuscript, the authors aim to enhance AlphaFold2 for protein conformation-selective drug discovery through the integration of AlphaFold2 and physics-based methods, focusing on improving the accuracy of predicting protein structures ensemble and small molecule binding of metastable protein conformations to facilitate targeted drug design.

      The major strength of the paper lies in the methodology, which includes the innovative integration of AlphaFold2 with all-atom enhanced sampling molecular dynamics and induced fit docking to produce protein ensembles with structural diversity. Moreover, the generated structures can be used as reliable crystal-like decoys to enrich metastable conformations of holo-like structures. The authors demonstrate the effectiveness of the proposed approach in producing metastable structures of three different protein kinases and perform docking with their type I and II inhibitors. The paper provides strong evidence supporting the potential impact of this technology in drug discovery. However, limitations may exist in the generalizability of the approach across other structures, especially complex structures such as protein-protein or DNA-protein complexes.

      Proteins undergo thermodynamic fluctuations and can occasionally reach metastable configurations. It can be assumed that other biomolecules, such as proteins and DNA, stabilize these metastable states when forming protein-protein or protein-DNA complexes. Since our method has the potential to identify these metastable states, it shows promise for designing drugs targeting proteins in allosteric configurations induced by other biomolecules.

      The authors largely achieved their aims by demonstrating that the AF2RAVE-Glide workflow can generate holo-like structure candidates with a 50% successful docking rate for known type II inhibitors. This work is likely to have a significant impact on the field by offering a more precise and efficient method for predicting protein structure ensemble, which is essential for designing targeted drugs. The utility of the integrated AF2RAVE-Glide approach may streamline the drug discovery process, potentially leading to the development of more effective and specific medications for various diseases.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions

      (1) The computational protocol is found to be insufficient to generate precise values of the relative free energies between structures generated. The authors note in the Conclusion that an enhancement in the workflow might result from the addition of free energy calculations. Can the authors comment on the prospects for generating more accurate estimates of the free energy that might be used to qualitatively evaluate poses and the free energy landscape surrounding putative metastable states? What are the principal challenges and what might help overcome them? What would the most effective computational protocol be?

      More accurate estimates of the free energy can theoretically be achieved by increasing the number of umbrella sampling windows and extending the simulation length until the PMF converges. However, there is always a trade-off between PMF accuracy and computational costs, so we have chosen to stick with the current setup. Metadynamics is another method to obtain a more accurate free energy profile, which we have used in previous versions of AlphaFold2-RAVE, but for the specific systems we investigated, it had issues in achieving back and forth movement given the high entropic nature of the activation loop. Research in enhanced sampling methods and dimensionality reduction techniques for reaction coordinates is continually evolving and will play a critical role in alleviating this problem.

      (2) I was surprised that there was not more correlation of a funnel-like shape in Figures S16 and S18, showing a stronger correlation between low RMSD and better docking score. This is true for both the ponatinib and imatinib applications in DDR1 and Abl1. That also seems true for the trimmed results for Src kinase in Figure S19. I was also surprised that there are structures with very large RMSD but docking scores comparable to the best structures of the lowest RMSD. Might something be done to make the docking score a more effective discriminator?

      The docking algorithm and docking score are used to filter out highly improbable docking poses. False positives in predicted docking poses are a common issue across all docking methods as described for instance in:

      Fan, Jiyu, Ailing Fu, and Le Zhang. "Progress in molecular docking." Quantitative Biology 7 (2019): 83-89.

      Ferreira, R.S., Simeonov, A., Jadhav, A., Eidam, O., Mott, B.T., Keiser, M.J., McKerrow, J.H., Maloney, D.J., Irwin, J.J. and Shoichet, B.K., 2010. "Complementarity between a docking and a high-throughput screen in discovering new cruzain inhibitors." Journal of medicinal chemistry, 53(13), pp.4891-4905.

      Moreover, there is always a trade-off between docking accuracy and computational cost. While employing more accurate docking methods may decrease false positives, it can also be resource-intensive. In such scenarios, our approach to enriching holo-structures can be impactful by reducing the number of pocket structures in the input ensembles and significantly enhancing docking efficiency.

      (3) I think that it is fine to identify one structure as "IFD winner" but also feel that its significance is overstressed, especially given that it can be identified only in a retrospective analysis rather than through de novo prediction.

      We agree with the reviewer. We did not intend to emphasize the specific structure "IFD winner". Rather, we aimed to demonstrate that our method can enrich promising candidates for holo-structures. We verified this by showing that our holo-structure candidates performed well in retrospective docking using IFD, which we previously referred to as "IFD winner". We have now revised this term to "holo-model".

      Minor Points

      p. 3 "DymanicBind" should be "DynamicBind"

      p. 3 Change "We chosen" to "We have chosen" or "we chose."

      p. 3 In identifying the Schrödinger software Glide and IFD, I recommend removing the subjective modifier "industry-leading."

      Modifications done.

      Reviewer #2 (Recommendations For The Authors):

      In the view of this reviewer, the writing is 'choppy'.

      We have tried to improve the writing.

      Reviewer #3 (Recommendations For The Authors):

      (1) In Figure 1, the workflow labels (i) to (iv) are not shown on the figures, making it difficult for readers to follow. Consider adding these labels to the figures.

      Modifications done.

      (2) Explain how Boltzmann ranks were calculated based on unbiased MD simulations to guide the enrichment of holo-like structures in metastable states.

      The Methods section is now updated for clarification.

      (3) The authors could clarify how the classical DFG-out decoys in the DDR1 rMSA AF2 ensemble are transferred to Abl1 kinase in the Methods section.

      The Methods section is now updated for clarification.

      (4) The authors can clarify the methodology section by providing more detailed explanations about how the unbiased MD simulations are performed, including which MD simulation software was used and whether energy minimization and equilibrium steps were needed as in conventional MD simulations, and other setup details.

      The Methods section is now updated for clarification.

      (5) The validation of the proposed approach in this work used three kinase proteins. The authors can enhance the discussion section by addressing other types of protein structure prediction that can use the proposed approach in drug discovery, beyond the three kinase proteins tested.

      The proposed approach is theoretically applicable to other types of proteins, such as GPCRs, where both conformational selection and the induced-fit effect are crucial. We have expanded the discussion on the generalization of our protocol in the Conclusion section.

      (6) The authors should add appropriate citations for the software and tools used in the manuscript. For example, a reference should be added for the Glide XP docking experiments that utilized the Maestro software. Double-check all related software citations.

      We have now updated the citations for docking experiments based on the instruction of the Maestro Glide User manual and IFD User manual.

      (7) The authors should consider offering a comprehensive list of software tools and databases utilized in the study to assist in replicating the experiments and further validating the results.

      We have now added a summary of tools used in the Methods section.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Responses to Reviewer #1:

      Reviewer #1: The study shows a new mechanism of NFkB-p65 regulation mediated by Vangl2-dependent autophagic targeting. Autophagic regulation of p65 has been reported earlier; this study brings an additional set of molecular players involved in this important regulatory event, which may have implications for chronic and acute inflammatory conditions.

      Comments on the revised version:

      The authors have addressed the earlier concerns and I am satisfied with the revised version. I have no additional comments to make.

      We appreciate the reviewer’s comments on our revised manuscript.

      Responses to Reviewer #2:

      Reviewer #2: Vangl2, a core planar cell polarity protein involved in Wnt/PCP signaling, cell proliferation, differentiation, homeostasis, and cell migration. Vangl2 malfunctioning has been linked to various human ailments, including autoimmune and neoplastic disorders. Interestingly, it was shown that Vangl2 interacts with the autophagy regulator p62, and autophagic degradation limits the activity of inflammatory mediators, such as p65/NF-κB. However, the possible role of Vangl2 in inflammation has not been investigated. In this manuscript, Lu et al. describe that Vangl2 expression is upregulated in human sepsis-associated PBMCs and that Vangl2 mitigates experimental sepsis in mice by negatively regulating p65/NF-κB signaling in myeloid cells. Their mechanistic studies further revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to promote K63-linked poly-ubiquitination of p65. Vangl2 also facilitated the recognition of ubiquitinated p65 by the cargo receptor NDP52. These molecular processes caused selective autophagic degradation of p65. Indeed, abrogation of PDLIM2 or NDP52 functions rescued p65 from autophagic degradation, leading to extended p65/NF-κB activity in myeloid cells. Overall, the manuscript presents convincing evidence for novel Vangl2-mediated control of inflammatory p65/NF-kB activity. The proposed pathway may expand interventional opportunities restraining aberrant p65/NF-kB activity in human ailments.

      IKK is known to mediate p65 phosphorylation, which instructs NF-kB transcriptional activity. In this manuscript, Vangl2 deficiency led to an increased accumulation of phosphorylated p65 and IKK also at 30 minutes post-LPS stimulation; however, autophagic degradation of p-p65 may not have been initiated at this early time point. Therefore, this set of data put forward the exciting possibility that Vangl2 could also be regulating the immediate early phase of inflammatory response involving the IKK-p65 axis - a proposition that may be tested in future studies.

      We appreciate the reviewer’s comments on our manuscript, and we have added the discussion about IKK-p65 axis in revised version. (Page 15, lines 467-474)

      Responses to Reviewer #3:

      Reviewer #3: Lu et al. describe Vangl2 as a negative regulator of inflammation in myeloid cells. The primary mechanism appears to be through binding p65 and promoting its degradation, albeit in an unusual autolysosome/autophagy dependent manner. Overall, these findings are novel, valuable and the crosstalk of PCP pathway protein Vangl2 with NF-kappaB is of interest. While generally solid, some concerns still remain about the rigor and conclusions drawn.

      Comments on the revised version:

      (1) Lu et al. address my comments through responses and new experimental data. However, some of the explanations provided are inadequate.

      However, in response to my enquiry regarding directly exploring PCP effects, the authors simply assert "Our study revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to facilitate K63-linked ubiquitination of p65, which is subsequently recognized by autophagy receptor NDP52 and then promotes the autophagic degradation of p65. Our findings by using autophagy inhibitors and autophagic-deficient cells indicate that Vangl2 regulates NFkB signaling through a selective autophagic pathway, rather than affecting the PCP pathway, WNT, HH/GLI, Fat-Dachsous or even mechanical tension."

      I do not agree that the use of autophagy inhibitors and autophagy-deficient cells can rule out the contributions of PCP or any other pathways. Only experimentally inhibiting the pathway(s) with adequate demonstration of target inhibition/abolition of well-known effector function and documenting unaltered p65 regulation under these conditions can be considered proof. Autophagy inhibitors and autophagy-deficient cells only prove that this particular pathway is necessary. Nonetheless, I do not want to dwell on proving a negative and agree that Vangl2 is a novel regulator of p65 through its role in promoting p65 degradation. The inclusion of a statement discussing the limitations of their approach would have sufficed. The response from the authors could have been better.

      We thank the reviewer for helping us improve the quality of the manuscript. We provided new data and revised the Discussion as suggested.

      To ascertain whether Vangl2 degrades p65 through a selective autophagic pathway or the PCP pathway, 293T cells were transfected with p65, together with or without the Vangl2 plasmids, and treated with different pharmacological inhibitors. We found the degradation of p65 induced by Vangl2 was blocked by autolysosome inhibitor (CQ), but not by the JNK inhibitor (SP600125) or Wnt/β-catenin inhibitor (FH535) (New Figure. 1). These data suggest that Vangl2 primarily degrades p65 through a selective autophagic pathway, rather than through the JNK or Wnt signaling pathway. Nevertheless, additional pathway inhibitions, such as those of the HH/GLI and Fat-Dachsous pathways, should also be employed to further elucidate the function of Vangl2 in p65 degradation. As suggested, we have added a statement about the limitation of the approach in the discussion (Page 12, lines 378-385).

      Author response image 1.

      Vangl2 degrades p65 through a selective autophagic pathway, but not by the PCP pathway. HEK293T cells were transfected with Flag-p65 and HA-Vangl2 plasmids, and treated with DMSO, CQ (50 mM) for 6 h, SP600125 (20 mM) for 1 h or FH535 (30 mM) for 6 h. The cell lysates were analyzed by immunoblot.

      (2) I am also not satisfied with the explanation that "immune cells represent a minor fraction of the lungs and liver". There are lots of resident immune cells in the lungs and liver (alveolar macrophages in the lung and Kuppfer cells in the liver). For example, it may be so that Vangl2 is important in monocytes and not in the resident population. This might be a potential explanation. But this is not explored. The restricted tissue-specificity of the interaction between two ubiquitously present proteins is still a challenge to understand. The response from the authors is not satisfactory. There is plenty of Vangl2 in the liver in their western blot.

      We thank the reviewer for this question. We added this explanation in the Discussion. (Page 13, lines 398-404)

      (3) I had also simply pointed out PMID: 34214490 with reference to the findings described in the manuscript. There were no suggestions of contradiction. In fact, I would refer to the publication in discussion to support the findings and stress the novelty. The response from the authors could have been better.

      Thank you for the reviewer's insightful comments. We have modified this discussion as suggested. (Page 13, lines 410-415; Page 14, lines 419-421)

      (4) The response to my enquiry regarding homo- or heterozygosity is unsupported by any reference or data.

      As suggested, we provided the data that only Vangl2 deficient homozygous showed inhibition of the activation of NF-kB in New Figure. 2.

      Author response image 2.

      Vangl2 deficiency promotes NF-kB activation. (A) The survival rates of WT, Vangl2ΔM/ΔM and Vangl2ΔM/WT mice treated with high-dosage of LPS (30 mg/kg, i.p.) (n≥4). (B) IL-6 and TNF-a secretion by WT and Vangl2-deficient BMDMs treated with LPS for 6 h was measured by ELISA. IL-1β secretion by WT, Vangl2ΔM/ΔM and Vangl2ΔM/WT BMDMs treated with LPS for 6 h and ATP for 30 min was measured by ELISA.

      (5) The listing of 8 patients and healthy controls are also appreciated. The body temperature of #6 doesn't fall in the <36 or >38 degree C SIRS criteria. The inclusion of CRP, PCT, heart rate and respiratory rate, and other lab values would have further improved the inclusion criteria. Moreover, it is difficult to understand why there are 16 value points for healthy and sepsis cohorts in Fig 1 when there are 8 patients.

      We thank the reviewer for this valuable suggestion. We are sorry for our mistake that we entered data from two repeated experiments in Figure. 1 A and we have revised this data in the updated version (Figure. 1 A, Pages 12 Lines 146). As suggested, we have added CRP, WBC and heart rate in sepsis patients’ information. (Supplementary Materials and Methods)

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The proposition that Vangl2 may target additional mediators of inflammation could be indicated in the text.

      We thank the reviewer for this valuable suggestion. We had added discussion in modified version. (Page 15, lines 467-474)

      Reviewer #3 (Recommendations For The Authors):

      It is advised that some of the deficiencies pointed out by Reviewer #3 are textually addressed. Additionally, there could be some inconsistency in the number of healthy controls and patients (see Fig S1A and FIg 1A and Supplementary table, also see comments from Reviewer #3) - this should be carefully scrutinised and revised, if necessary.

      We thank the reviewer for this valuable suggestion. We are sorry for our mistake that we entered data from two repeated experiments in Figure. 1 A and we have revised this data in the updated version (Figure. 1 A, Pages 12 Lines 146).

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife assessment:

      In this useful study, the authors analyze droplet size distributions of multiple protein condensates and their fit to a scaling ansatz, highlighting that they exhibit features of first- and second-order phase transitions. The experimental evidence is still incomplete as the measurements were apparently done only at one time point, neglecting the possibility that droplet size distribution can evolve with time. The text would benefit from a connection to and contextualization with the well-understood expectations from the coupling of percolation and phase separation in protein condensates - a phenomenon that is increasingly gaining consensus amongst the community and that emphasizes "liquid-gas" criticality. 

      We have now carried out new experiments at multiple time points to establish that the droplet size distributions are stationary below the critical concentration. We have also addressed the comments made by the reviewers about the nature of the phase transition.

      Our analysis does not depend on a specific hypothesis on the nature of the phase transition, whether it be percolation or a gas-liquid critical transition. The scaling that we observed is an emergent property that is independent from the possible theoretical models used to describe the phase transition. In fact, our scaling analysis indicates that any theoretical model proposed for protein phase separation should predict the critical exponents that we reported. 

      Reviewer #1

      The authors analyse droplet size distributions of multiple protein condensates and fit to a scaling ansatz to highlight that they exhibit features of first-order and second-order phase transitions. While the experimental evidence is solid, the text lacks connection and contextualization to the well-understood expectations from the coupling of percolation and phase separation in protein condensates - a phenomenon that is increasingly gaining consensus amongst the community. The evidence supports the percolation and phase separation model rather than being close to a true critical point in the liquid-gas phase space. Overall, the work is useful to the community.

      We are grateful to the reviewer for these positive comments. We would like to emphasises that our contribution is not to propose a theoretical model, but rather to report a scaling behaviour in the experimentally measured droplet size distributions. The main implication of our work is that any theoretical model should predict the scaling exponents that we derived from the experimental measurements.

      Strengths: 

      The experimental analysis of distinct protein condensates is very well done and the reported exponents/scaling framework provides a clear framework to help the community deconvolve signatures of percolation in condensates. 

      Weaknesses: 

      The principal concern this reviewer has is that the reviewers adopt a framing in this paper to present a discovery of second-order features and connections to criticality - however, they ignore/miss the connections to percolation (a well-understood second-order transition that is expected to play a major role in protein condensates). I believe this needs to be addressed and the paper suitably revised to help connect with these expectations. 

      The scaling that we found is not characteristic standard percolation, since the exponents that we obtained (a=0 and f=1) are different from those of percolation (a=1.19 and f=2.21). This difference indicates that protein phase separation is not in the same universality class of standard percolation. Further studies will be required to understand whether theoretical models based on percolation could predict the observed critical exponents.

      - Protein condensates have been increasingly understood to be described as fluids whose assembly is driven by a connection of density (phase separation, first-order) and connectivity (percolation, second-order) transitions. This has been long known in the polymer community (Flory, Stockmayer, Tanaka, Rubinstein, Semenov, and others) and recently repopularized in the condensate community (by Pappu and Mittag, in particular, amongst others). The authors make no connections to any of these frameworks - which actually seem to be the essence of what they are describing. 

      As mentioned above, our purpose was neither to support an existing theoretical model, nor to propose a new one. Rather, we have reported a scaling behaviour and scaling exponents not noted before. Further studies will be required to establish whether existing theoretical models could account for this scaling behaviour.

      - Percolation theory, which has been around for more than half a century, has clear-cut scaling laws that have essentially similar forms to the ansatz adopted by the authors, and the commonalities/differences are not discussed by the authors - this is essential since this provides a physical basis for their ansatz rather than an arbitrary mathematical formulation. In particular, percolation models connect size distribution exponents to factors like dimensionality, valence, etc. and if these connections can be made with this data, that would be very powerful. 

      The scaling ansatz that we are using is commonly adopted in studies of critical phenomena, and it is not specific to percolation. The scaling exponents depends only on very few attributes like dimensionality, symmetries and if interactions are short or long range. These attributes determine the universality class. As such, scaling does not link with molecular determinants, but can distinguish different classes.

      - The connections between spinodal decomposition and second-order phase transitions are very confusing. Spindal decomposition happens when the barriers for first-order phase transitions are zero and systems can phase separate without crossing nucleation barriers. Further, the "criticality" discussed in the paper is confusing since it more likely refers to a percolation threshold and much less likely to a "critical temperature" (Tc -where spinodal and binodals become identical). I would recommend reframing this argument. 

      We cannot refer to percolation threshold as our model is not readily compatible with it. We elaborated and better explained the differences between these models.

      It's unlikely, in this reviewer's opinion, that the authors are actually discussing a "first-order" liquid-gas critical point - because saturation concentrations of these proteins can be much higher with temperature and the critical point would thus likely be at much higher concentrations (and ofc temperature). Further, the scaling exponents don't fall into that class naturally. However, if the authors disagree, I would appreciate clear quantitative reasons (including through the scaling exponents in that universality class) and be happy to be convinced to change my mind. As provided, the data does not support this model. 

      We have now clarified in the manuscript that we do not discuss the liquid-gas critical point.

      Reviewer #2

      This is a potentially interesting study addressing a possible scale-invariant log-normal characteristic of droplet size distribution in the phase separation behavior of biomolecular condensates. Some of the data presented are valuable and intriguing. However, as it stands, the validity and utility of this study are uncertain because there are serious deficiencies in the execution and presentation of the authors' results. Many of these shortcomings are fundamental, including a lack of clarity in the basic conceptual framework of the study, insufficient justification of the experimental setup, less-than-conclusive experimental evidence, and inadequate discussion of implications of the authors' findings to future experimental and theoretical studies of biomolecular condensates. Accordingly, this reviewer considers that the manuscript should undergo a major revision to address the following. In particular, the discussion should be significantly expanded by including references mentioned below as well as other references pertinent to the issues raised. 

      We thank the reviewer for the helpful comments. In the revised version of the manuscript we clarified that we aimed to use a well-established tool – the scaling analysis – to study phase transition and applied to the protein condensation process. This approach offers insight into a universal aspect of protein phase separation, and also provides a practical approach to determine the phase boundary. The observed fat-tailed distribution of protein droplet sizes is not what is normally observed in more standard phase separation systems in the subsaturated phase. Our contribution is not to propose a theoretical model, but rather to report the observation of a scaling behaviour. 

      (1) The theoretical analysis in this study is based on experimental data on condensed droplet size distributions for FUS and α-synuclein. The size data for FUS droplet is indirect as it relies on the assumption that FUS droplet diameter is proportional to fluorescence intensity of labeled FUS (page 10 of manuscript), with fluorescence data adopted from a previously published work by another group (Kar et al. & Pappu, ref.27). Because fluorescence of a droplet is expected to be dependent upon the condensed-phase concentration of FUS, this proportional relationship, even if it holds, must also be modulated by FUS concentration in the droplet. Moreover, why should fluorescence be proportional to diameter but not the cross-sectional area or volume of the FUS droplet, which would be more intuitive? These issues should be clarified. A new measure by microscopy is used to determine the size distribution of condensed α-synuclein; but no microscopy image is shown. It is of critical importance that such raw data (for example microscopy images) be presented for the completeness and reproducibility of the experiment because the entire study relies on the soundness of these experimental measurements. 

      As we mentioned in the article, for the scaling analysis, the droplet dimensions could be assessed in 1D (length), 2D (area) or 3D (volume). For the FUS experiments, we used the data as the authors provided in the original publication (PNAS 2022). For alpha-synuclein, we provided the data in the article. 

      (2) Despite the authors' claim of a universal scaling relationship, the log-log scatter plots in Figure 1 (page 15 of the manuscript) exhibit significant deviations from linearity at low protein concentrations (ρ→0). Given this fact, is universal scaling really valid? Discussion of this behavior is conspicuously absent (except the statement that these data points are excluded in the fit). In any case, the possible origins of these deviations should be thoroughly discussed so that the regime of universal scaling can be properly delineated. 

      In general, one would expect the scaling ansatz to be valid close to the phase boundary. It is the feature of the ansatz, that further away from the boundary, deviations are expected because of the decreasing relevance of critical phenomena.

      (3) Droplet size distribution most likely depends on the time duration after the preparation of the sample. For α-synuclein, "liquid droplet size characterisation images were captured 10 minutes post-liquid droplet formation" (page 9 of the manuscript). Why 10 minutes? Have the authors tried imaging at different time points and, if so, do the distributions at different time points remain essentially the same? If they are different, what is the criterion for focusing only on a particular time point? Information related to these questions should be provided. 

      We have now determined the droplet size distribution of alpha-synuclein at different time points, finding that they are not dependent on time within experimental uncertainties (Figure 6 in the revised manuscript).

      (4) At least two well-known mechanisms can lead to the time-dependent distribution of liquid droplet sizes: (i) coalescence of droplets in spatial proximity to form a larger droplet, and (ii) Ostwald ripening, i.e., formation of larger droplets concomitant with the dissolution of smaller droplets without fusion of droplets. The implications of these mechanisms on the authors' droplet size distributions should be addressed. Indeed, maintaining a size distribution against these mechanisms in vivo often requires active suppression [Bressloff, Phys Rev E 101, 042804 (2020)] with possible involvement of chemical reactions [Kirschbaum & Zwicker, J R Soc Interface 18, 20210255 (2021)]. These considerations are central to the basic rationale of this study and therefore should be carefully tackled. 

      These two mechanism of growth are relevant above the critical concentration. Below the critical concentration, which is the regime that we investigated in our work, there is no need of active suppression.

      (5) If coalescence and/or Ostwald ripening do occur, given sufficient time after sample preparation, the condensed phase may become a single large "droplet" or a single liquid layer. Does this occur in the authors' experiments? 

      As we are below the critical concentration, this is unlikely to occur, as indeed supported by the experiments mentioned at point (3). 

      (6) It is unclear whether the authors aim to address the kinetic phenomenon of liquid droplet formation and evolution or equilibrium properties. The two types of phenomena appear to be conflated in the authors' narrative. Clarification is needed. If this work aims to address timeindependent (or infinite-time) equilibrium properties, how are they expected to be related to droplet size distribution, which most likely is time-dependent? 

      Our analysis focuses on the equilibrium properties of the droplet size distribution below the critical concentration, and it should guide the proposal of a theoretical model that explains the emergence of scaling. In the introductory part of our manuscript, we proposed a possible scenario that tries to extend the Flory-Huggins’s theory to predict a scaling behaviour appropriate to a critical transition. Other scenarios are possible, and our result along with further experiments are needed to arrive at a deeper understanding of protein aggregation.

      (7) The relationship between the potentially time-dependent droplet size distribution and equilibrium properties of ρt and ρc (transition and critical concentrations, respectively) should be better spelled out. An added illustrative figure will be helpful. 

      We are addressing equilibrium properties, not kinetic ones. See also the answers to point 6.

      (8) The authors comment that their findings appear to be inconsistent with Flory-Huggins theory because Flory-Huggins "characterizes droplet formation as a consequence of nucleation ..." (page 8 of the manuscript). Here, three issues need detailed clarification: (i) In what way does Flory-Huggins mandate nucleation? (ii) Why are the findings of apparent scale invariance inconsistent with nucleation? (iii) If liquid droplet formations do not arise from nucleation, what physical mechanism(s) is (are) envisioned by the authors to be underpinning the formation of condensed liquid droplets in protein phase separation? 

      We do agree that the Flory-Huggins theory does not mandate nucleation above the spinodal line. However, we are addressing the equilibrium properties below the critical concentration, so the stable phase is the dilute phase, and there is no nucleation.

      (9) Are any of the authors' findings related to finite-system effects of phase separation [see, e.g., Nilsson & Irbäck, Phys Rev E 101, 022413 (2020)]?  

      Our experimental system is macroscopic, so we would not expect finite size effects.

      (10) Since the authors are using their observation of an apparent scale-invariant droplet size distribution to evaluate phase separation theory, it is important to clarify whether their findings provide any constraint on the shape of coexistence curves (phase diagrams). 

      We are only reporting the phenomenological observation of a scaling behaviour, so we may not speculate at this stage on the constraints of the coexistence curves. This is indeed an exciting opportunity for future studies.

      (11) More specifically, do the authors' findings suggest that the phase diagrams predicted by Flory-Huggins are invalid? Or, are they suggesting that even if the phase diagrams predicted by Flory-Huggins are empirically correct (if verified by experimental testing), they are underpinned by a free energy function different from that of Flory-Huggins? It is important to answer this question to clarify the implications of the authors' findings on equilibrium phase behaviors and the falsifiability of the implications. 

      As mentioned above, our main conclusion is that the droplet size distribution follows a scaling behaviour.  Our contribution is not to propose a theoretical model, but rather to propose a scaling behaviour that should be accounted for by existing of future theoretical models.

      (12) How about the implications of the authors' findings on other theories of protein phase separation that are based on interactions that are different from the short spatial range interactions treated by Flory-Huggins? For instance, it has been observed that whereas the Flory-Huggins-predicted phase diagrams always convex upward, phase diagrams for charged intrinsically disordered proteins with long spatial range Coulomb interactions exhibit a region that concave upward [Das et al., Phys Chem Chem Phys 20, 28558-28574 (2018)]. Can information be provided by the authors' findings regarding apparent scale-invariant droplet size distribution on the underlying interaction driving the protein molecules toward phase separation? 

      This is an interesting point for future studies about the type of interactions that give rise to the observed scaling behaviour.

      (13) Table S1 (page 4) and Table S2 (page 7) are mentioned in the text but these tables are not in the submitted files. 

      We have added the Supplementary Tables as well as the source files for the figures.

      (14) The two systems studied (FUS and α-synuclein) have a single intrinsically disordered protein (IDP) component. It is not clear if the authors expect their claimed scaling relation to be applicable to systems with multiple IDP components and if so, why.

      From the data that we have currently analysed, we feel that we may not speculate on this interesting point, leaving it to future studies.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The authors present evidence suggesting that MDA5 can substitute as a sensor for triphosphate RNA in a species that naturally lacks RIG-I. The key findings are potentially important for our understanding of the evolution of innate immune responses. Compared to an earlier version of the paper, the strength of evidence has improved but it is still partially incomplete due to a few key missing experiments and controls.

      We would like to thank the editorial team for their positive comments and constructive suggestions on improving our manuscript. We have made further improvements based on the valuable suggestions of the reviewers, and we are pleased to send you the revised manuscript now. After revising the manuscript and further supplementing with experiments, we think that our existing data can support our claims.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study offers valuable insights into host-virus interactions, emphasizing the adaptability of the immune system. Readers should recognize the significance of MDA5 in potentially replacing RIG-I and the adversarial strategy employed by 5'ppp-RNA SCRV in degrading MDA5 mediated by m6A modification in different species, further indicating that m6A is a conservational process in the antiviral immune response.

      However, caution is warranted in extrapolating these findings universally, given the dynamic nature of host-virus dynamics. The study provides a snapshot into the complexity of these interactions, but further research is needed to validate and extend these insights, considering potential variations across viral species and environmental contexts. Additionally, it is noted that the main claims put forth in the manuscript are only partially supported by the data presented.

      After meticulous revisions of the manuscript, including adjustments to the title, abstract, results, and discussion, the main claim of our study now is the arm race between the MDA5 receptor and SCRV virus in a lower vertebrate fish, M. miiuy. This mainly includes two parts: Firstly, the MDA5 of M. miiuy can recognize virus invasion and initiate host immune response by recognizing the triphosphate structure of SCRV. Secondly, as an adversarial strategy, 5’ppp-RNA SCRV virus can utilize the m6A mechanism to degrade MDA5 in M. miiuy. Based on the reviewer's suggestions, we have further supplemented the critical experiments (Figure 3F-3G, Figure 4D, Figure 5G) and provided a more detailed and accurate explanation of the experimental conclusions, we believe that our existing manuscript can support our main claims. In addition, because virus-host coevolution complicates the derivation of universal conclusions, we will further expand our insights in future research.

      Reviewer #2 (Public Review):

      This manuscript by Geng et al. aims to demonstrate that MDA5 compensates for the loss of RIG-I in certain species, such as teleost fish miiuy croaker. The authors use siniperca cheats rhabdovirus (SCRV) and poly(I:C) to demonstrate that these RNA ligands induce an IFN response in an MDA5-dependent manner in m.miiuy derived cells. Furthermore, they show that MDA5 requires its RD domain to directly bind to SCRV RNA and to induce an IFN response. They use in vitro synthesized RNA with a 5'triphosphate (or lacking a 5'triphosphate as a control) to demonstrate that MDA5 can directly bind to 5'-triphosphorylated RNA. The second part of the paper is devoted to m6A modification of MDA5 transcripts by SCRV as an immune evasion strategy. The authors demonstrate that the modification of MDA5 with m6A is increased upon infection and that this causes increased decay of MDA5 and consequently a decreased IFN response.

      One critical caveat in this study is that it does not address whether ppp-SCRV RNA induces IRF3-dimerization and type I IFN induction in an MDA5 dependent manner. The data demonstrate that mmiMDA5 can bind to triphosphorylated RNA (Fig. 4D). In addition, triphosphorylated RNA can dimerize IRF3 (4C). However, a key experiment that ties these two observations together is missing.

      Specifically, although Fig. 4C demonstrates that 5'ppp-SCRV RNA induces dimerization (unlike its dephosphorylated or capped derivatives), this does not proof that this happens in an MDA5-dependent manner. This experiment should have been done in WT and siMDA5 MKC cells side-by-side to demonstrate that the IRF3 dimerization that is observed here is mediated by MDA5 and not by another (unknown) protein. The same holds true for Fig. 4J.

      Thank you for the referee's professional suggestions. In fact, we have transfected SCRV RNA into WT and si-MDA5 MKC cells, and subsequently assessed the dimerization of IRF3 and the IFN response (Figure 2P-2Q). The results indicated that knockdown of MDA5 prevents immune activation of SCRV RNA. However, considering the potential for SCRV RNA to activate immunity independent of the triphosphate structure, this experimental observation does not comprehensively establish the MDA5-dependent induction of IRF3 dimer by 5’ppp-RNA. Accordingly, in accordance with the referee's recommendation, we proceeded to investigate the inducible activity of 5'ppp-SCRV on IRF3 dimerization in WT and si-MDA5 MKC cells, revealing that 5'ppp-SCRV indeed elicits immunity in an MDA5-dependent manner (Figure 4D). Additionally, poly(I:C)-HMW, a known ligand for MDA5, demonstrated a residual, albeit attenuated, activation of IRF3 following MDA5 knockdown, potentially attributed to its capacity to stimulate immunity through alternative pathways such as TLR3.

      - Fig 1C-D: these experiments are not sufficiently convincing, i.e. the difference in IRF3 dimerization between VSV-RNA and VSV-RNA+CIAP transfection is minimal.

      We have reconstituted the necessary materials and repeated the pertinent experiments depicted in Fig 1C-1D. The results demonstrate that SCRV-RNA+CIAP and VSV-RNA+CIAP exhibit a mitigating effect on the induction activity of SCRV-RNA and VSV-RNA on IRF3 dimerization, albeit without complete elimination (Figure 1C and 1D). These findings suggest the presence of receptors within M. miiuy and G. gallus capable of recognizing the viral triphosphate structure; however, it is worth noting that RNA derived from SCRV and VSV viruses does not exclusively depend on the triphosphate structure to activate the host's antiviral response.

      Fig. 2N and 2O: why did the authors decide to use overexpression of MDA5 to assess the impact of STING on MDA5-mediated IFN induction? This should have been done in cells transfected with SCRV or polyIC (as in 2D-G) or in infected cells (as in 2H-K). In addition, it is a pity that the authors did not include an siMAVS condition alongside siSTING, to investigate the relative contribution of MAVS versus STING to the MDA5-mediated IFN response. Panel O suggests that the IFN response is completely dependent on STING, which is hard to envision.

      In our previous laboratory investigations, we have substantiated the induction effect of STING on IFN under SCRV infection or poly(I:C) stimulation, as documented in the relevant literature (10.1007/s11427-020-1789-5), which we have referenced in our manuscript (lines 177-178). While we did assess the impact of STING on MDA5-mediated IFN induction in SCRV-infected cells, as indicated in the figure legends, we have revised Figure 2N-2O for improved clarity, and similarly, Figure 1H-1I has also been updated. Furthermore, considering that RNA virus infection can activate the cGAS/STING axis (10.3389/fcimb.2023.1172739) and the significant role of MAVS in sensing RNA virus invasion in the NLR pathway (10.1038/ni.1782), it is challenging to ascertain the respective contributions of STING and MAVS to the immune signaling cascade mediated by MDA5 during RNA virus infection. We intend to explore this aspect further in future research endeavors.

      Fig. 3F and 3G: where are the mock-transfected/infected conditions? Given that ectopic expression of hMDA5 is known to cause autoactivation of the IFN pathway, the baseline ISG levels should be shown (ie. In absence of a stimulus or infection). Normalization of the data does not reveal whether this is the case and is therefore misleading.

      Based on the reviewer's suggestions, we have rerun the experiment. We examined the effects of MDA5 and MDA5-ΔRD on antiviral factors in both uninfected, SCRV-infected, and poly(I:C)-HMW-stimulated MKC cells. Results showed that overexpression of both MDA5 and MDA5-ΔRD stimulated the expression of antiviral genes. However, when cells were infected or stimulated with SCRV or poly(I:C)-HMW, only the overexpression of MDA5, not MDA5-ΔRD, significantly increased the expression of antiviral genes (Figure 3F-3I).

      Fig. 4F and 4G: can the authors please indicate in the figure which area of the gel is relevant here? The band that runs halfway the gel? If so, the effects described in the text are not supported by the data (i.e. the 5'OH-SCRV and 5'pppGG-SCRV appear to compete with Bio-5'ppp-SCRV as well as 5'ppp-SCRV).

      Apologies for any confusion. The relevant areas in the gel pertaining to the experimental findings were denoted with asterisks and elaborated upon in the figure legends (Figure 4G, 4H, and 4M). The findings indicated that 5'ppp-SCRV, in contrast to 5'OH-SCRV and 5'pppGG-SCRV, demonstrated the ability to compete with bio-5'ppp-SCRV.

      My concerns about Fig. 5 remain unaltered. The fact that MDA5 is an ISG explains its increased expression and increased methylation pattern. The authors should at the very least mention in their text that MDA5 is an ISG and that their observations may be partially explained by this fact.

      First, as our m6A change analysis pipeline controls for changes in gene expression, these data should represent true changes in m6A modification rather than changes in the expression of m6A-modified transcripts (10.1038/s41598-020-63355-3). Similar studies demonstrated that m6A modification in RIOK3 and CIRBP mRNAs are altered following Flaviviridae infection (10.1016/j.molcel.2019.11.007). The specific calculation method is as follows: relative m6A level for each transcript was calculated as the percent of input in each condition normalized to that of the respective positive control spike-in. Fold change of enrichment was calculated with mock samples normalized to 1. Therefore, changes in the expression level of MDA5 can partially explain the increase in m6A modification on all MDA5 mRNA in cells, but it cannot indicate changes in m6A modification on each mDA5 transcript. We have supplemented the calculation method process in the manuscript and cited relevant literature (Lines 606-608). In addition, we have elaborated on the fact that MDA5 is an ISG gene in the experimental results (lines 260-261), and emphasized its compatibility with enhanced m6A modification of MDA5 in the discussion section (lines 405-409).

      Reviewer #3 (Public Review):

      In this manuscript, the authors explored the interaction between the pattern recognition receptor MDA5 and 5'ppp-RNA in the Miiuy croaker. They found that MDA5 can serve as a substitute for RIG-I in detecting 5'ppp-RNA of Siniperca cheilinus rhabdovirus (SCRV) when RIG-I is absent in Miiuy croaker. Furthermore, they observed MDA5's recognition of 5'ppp-RNA in chickens (Gallus gallus), a species lacking RIG-I. Additionally, the authors documented that MDA5's functionality can be compromised by m6A-mediated methylation and degradation of MDA5 mRNA, orchestrated by the METTL3/14-YTHDF2/3 regulatory network in Miiuy croaker during SCRV infection. This impairment compromises the innate antiviral immunity of fish, facilitating SCRV's immune evasion. These findings offer valuable insights into the adaptation and functional diversity of innate antiviral mechanisms in vertebrates.

      We extend our sincere appreciation for your professional comments and insightful suggestions on our manuscript, as they have significantly contributed to enhancing its quality.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The interpretation of Figures 1H and I, along with the captions, seems unclear. Particularly, understanding the meaning of the X-axis in Figure I is challenging. Additionally, the designation of "H2O = 1" on the Y-axis in Figure 1E lacks clarity. It would be helpful if the author could revise and clarify these figures for better comprehension.

      We appreciate your reminder and have corrected and clarified these figures and figure legends (lines 768-772). We have replaced the Y-axis of Figure 1I with "Relative mRNA expression" instead of " Relative IFN-1 expression" (Figure 1I). In addition, we have added an explanation of "H2O=1" in the legend of Figure 1E.

      (2) The interpretation of Figure 5 in section 2.5 seems incomplete. The author mentioned that both m6A levels and MDA5 expression levels are increased (lines 256-257), prompting questions about the relationship between m6A and MDA5 expression. If higher m6A levels typically lead to MDA5 mRNA instability and lower MDA5 expression, observing both increasing simultaneously appears contradictory. Considering the dynamic changes shown in Figure 5, it would be more appropriate to propose an alteration in both m6A levels and MDA5 expression levels. Given the fluctuating nature of these changes, definitively labeling them as solely "increased" is challenging. Therefore, offering a nuanced interpretation of the results and clarifying this aspect would bolster the study's conclusions.

      While changes in m6A modification and the expression of m6A-modified transcripts are biologically relevant, identifying bona fide m6A alterations during viral infection will allow us to understand how m6A modification of cellular mRNA is regulated. As our m6A change analysis pipeline controls for changes in gene expression, these data should represent true changes in m6A modification rather than changes in the expression of m6A-modified transcripts (10.1038/s41598-020-63355-3). Similar studies demonstrated that m6A modification in RIOK3 and CIRBP mRNAs are altered following Flaviviridae infection (10.1016/j.molcel.2019.11.007). The specific calculation method is as follows: relative m6A level for each transcript was calculated as the percent of input in each condition normalized to that of the respective positive control spike-in. Fold change of enrichment was calculated with mock samples normalized to 1. Therefore, the upregulation of MDA5 expression can partially explain the increase in m6A modification on all MDA5 mRNA in cells, but it cannot indicate changes in m6A modification on each mDA5 transcript. We have supplemented the calculation method process in the manuscript and cited relevant literature. I hope to receive your understanding.

      In addition, although higher m6A levels often lead to unstable MDA5 mRNA and lower MDA5 expression, SCRV can affect MDA5 expression through multiple pathways. For example, since MDA5 is an interferon-stimulated gene, the infection of SCRV virus can cause strong expression of interferon and indirectly induce high-level expression of MDA5. Therefore, the expression of MDA5 is not contradictory to the simultaneous increase in MDA5 modification (24 h). In order to further enhance our experimental conclusions, we supplemented the dual fluorescence experiment. The results indicate that, the infection of SCRV can inhibit the fluorescence activity of MDA5-exon1 reporter plasmids containing m6A sites but not including the promoter sequence of the MDA5 gene, and this inhibitory effect can be counteracted by cycloleucine (CL, an amino acid analogue that can inhibit m6A modification) (Figure 5G). This further indicates that SCRV can reduce the expression of MDA5 through the m6A pathway.

      Finally, in light of the fluctuations in MDA5 expression levels, we have changed the subheadings of Results 2.5 section and provided a more comprehensive and precise elucidation of the experimental outcomes. We are grateful for your valuable feedback.

      (3) In the discussion section, it would indeed be advantageous for the author to explore the novelty of this work more comprehensively, moving beyond merely acknowledging the widespread loss of RIG-I and suggesting MDA5 as a compensatory mechanism. Considering the well-established roles of MDA5 and m6A in host-virus interactions, the findings of this study may seem familiar in light of previous research. To enhance the discussion, it would be valuable for the author to delve into the implications of this evolutionary model. For instance, does the compensation or loss of RIG-I impact a species' susceptibility to specific types of viruses? Exploring such questions would provide insight into the broader significance of this compensation model and its potential effects on host-virus interactions, thus adding depth to the study's contribution.

      We appreciate the expert advice provided by the referee. In response, we have expanded our discussion in the relevant section, addressing the potential influence of RIG-I deficiency and MDA5 compensation on the antiviral immune system in vertebrates (lines 371-376). Furthermore, we underscore the significance of exploring the impact of SCRV infection on MDA5 m6A modification, considering its compatibility with MDA5 as an ISG gene, in elucidating the host response to viral infection (lines 405-409).

      (4) To improve the manuscript, it would be beneficial if the editors could aid the author in refining the language. Many descriptions in the article are overly redundant, and there should be appropriate differentiation between experimental methods and results.

      We appreciate the reviewer’s comment. We have carefully revised the manuscript and removed redundant descriptions in the experimental results and methods.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed all of my concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study seeks to establish accurate computational models to explore the role of hydrodynamic interactions on energy savings and spatial patterns in fish schools. Specifically, the authors consider a system of (one degree-of-freedom) flapping airfoils that passively position themselves with respect to the streamwise direction, while oscillating at the same frequency and amplitude, with a given phase lag and at a constant cross-stream distance. By parametrically varying the phase lag and the cross-stream distance, they systematically explore the stability and energy costs of emergent configurations. Computational findings are leveraged to distill insights into universal relationships and clarify the role of the wake of the leading foil.

      We would like to thank the referee for their careful read of the manuscript and for their constructive feedback. We appreciate it.

      Strengths:

      (1) The use of multiple computational models (computational fluid dynamics, CFD, for full Navier-Stokes equations and computationally efficient inviscid vortex sheet, VS, model) offers an extra degree of reliability of the observed findings and backing to the use of simplified models for future research in more complex settings.

      (2) The systematic assessment of the stability and energy savings in multiple configurations of pairs and larger ensembles of flapping foils is an important addition to the literature.

      (3) The discovery of a linear phase-distance relationship in the formation attained by pairs of flapping foils is a significant contribution, which helps compare different experimental observations in the literature.

      (4) The observation of a critical size effect for in-line formations of larger, above which cohesion and energetic benefits are lost at once, is a new discovery in the field.

      Thank you for this list of strength – we are delighted that these ideas were clearly communicated in our manuscript.

      Note that Newbolt et al. PNAS, 2019 reported distance as a function of phase for pairs of flapping hydrofoils, and Li et al, Nat. Comm., 2020 also reported phase-distance relationship in robotic and biological fish (calling it Vortex Phase Matching). We compiled their results, together with our and other numerical and experimental results, showing that the linear distance-phase relationship is universal.

      Weaknesses:

      (1) The extent to which observations on one-degree-of-freedom flapping foils could translate to real fish schools is presently unclear so some of the conclusions on live fish schools are likely to be overstated and would benefit from some more biological framing.

      Thank you for bringing up this point. Indeed, flapping foils that are free to translate in both the x- and y-directions and rotate in the x-y plane could drift apart in the y-direction. However, this drift occurs at a longer time scale than the forward swimming motion; it is much slower. For this reason, we feel justified to ignore it for the purpose of this study, especially that the pairwise equilibria in the swimming x-direction are reached at a faster time scale.

      Below, we include two snapshots taken from published work from the group of Petros Koumoutsakos (Gazzola et al, SIAM 2014). The figures show, respectively, a pair and a group of five undulating swimmers, free to move and rotate in the x-y plane. The evolution of the two and five swimmers is computed in the absence of any control. The lateral drift is clearly sub-dominant to the forward motion. Similar results were reported in Verma et al, PNAS 2018.

      These results are independent on the details of the flow interactions model. For example, similar lateral drift is observed using the dipole model dipole model (Kanso & Tsang, FDR 2014, Tsang & Kanso, JNLS 2023).

      Another reason why we feel justified to ignore these additional degrees of freedom is the following: we assume a live fish or robotic vehicle would have feedback control mechanisms that correct for such drift. Given that it is a slowly-growing drift, we hypothesize that the organism or robot would have sufficient time to respond and correct its course.

      Indeed, in Zhu et al. 2022, an RL controller, which drives an individual fish-like swimmer to swim at a given speed and direction, when applied to pairs of swimmers, resulted in the pair "passively" forming a stable school without any additional information about each other.

      We edited the main manuscript in page 4 of the manuscript to include reference to the work cited here and to explain the reasons for ignoring the lateral drift.

      Citations:  

      Gazzola, M., Hejazialhosseini, B., & Koumoutsakos, P. (2014). Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmersSIAM Journal on Scientific Computing36(3), B622-B639. DOI: https://doi.org/10.1137/130943078

      Verma, S., Novati, G., & Koumoutsakos, P. (2018). Efficient collective swimming by harnessing vortices through deep reinforcement learningProceedings of the National Academy of Sciences115(23), 5849-5854. DOI: https://doi.org/10.1073/pnas.1800923115

      Tsang, A. C. H. & Kanso, E., (2013). Dipole Interactions in Doubly Periodic DomainsJournal of Nonlinear Science 23 (2013): 971-991. DOI: https://doi.org/10.1007/s00332-013-9174-5

      Kanso, E., & Tsang, A. C. H. (2014). Dipole models of self-propelled bodiesFluid Dynamics Research46(6), 061407. DOI: https://doi.org/10.1088/0169-5983/46/6/061407

      Zhu, Y., Pang, J. H., & Tian, F. B. (2022). Stable schooling formations emerge from the combined effect of the active control and passive self-organizationFluids7(1), 41. DOI: https://doi.org/10.3390/fluids7010041

      Author response image 1.

      Antiphase self-propelled anguilliform swimmers. (a) – (d) Wavelet adapted vorticity fields at, respectively, t = T, t = 4T, t = 10T. (e) Absolute normalized velocities |U|/L. (f) Swimmers’ centre of mass trajectories.

      Author response image 2.

      Parallel schooling formation. (a) – (d) wavelet adapted vorticity fields at, respectively, t = T, t = 4T, t = 7T, t = 10T. (e) Absolute normalized velocities |U|/L. (f) Swimmers’ center of mass trajectories.

      (2) The analysis of non-reciprocal coupling is not as novel as the rest of the study and potentially not as convincing due to the chosen linear metric of interaction (that is, the flow agreement).

      We thank the referee for this candid and constructive feedback. In fact, we view this aspect of the study as most “revolutionary” because it provides a novel approach to pre-computing the locations of stable equilibria even without doing expensive all-to-all coupled simulations or experiments.

      Basically, the idea is the following: you give me a flow field, it doesn’t matter how you obtained it, whether from simulations or experimentally, and I can tell you at what locations in this flow field a virtual flapping swimmer would be stable and save hydrodynamic energy!

      In the revised version, we changed page 3 and 7 in main text, and added a new section “Diagnostic tools” in SI to better illustrate this.

      Overall, this is a rigorous effort on a critical topic: findings of the research can offer important insight into the hydrodynamics of fish schooling, stimulating interdisciplinary research at the interface of computational fluid mechanics and biology.

      We thank the referee again for their careful read of the manuscript and their constructive feedback.

      Reviewer #2 (Public Review):

      The document "Mapping spatial patterns to energetic benefits in groups of flow-coupled swimmers" by Heydari et al. uses several types of simulations and models to address aspects of stability of position and power consumption in few-body groups of pitching foils. I think the work has the potential to be a valuable and timely contribution to an important subject area. The supporting evidence is largely quite convincing, though some details could raise questions, and there is room for improvement in the presentation. My recommendations are focused on clarifying the presentation and perhaps spurring the authors to assess additional aspects:

      We would like to thank the referee for their careful read of the manuscript and for their constructive feedback. We appreciate it.

      (1) Why do the authors choose to set the swimmers free only in the propulsion direction? I can understand constraining all the positions/orientations for investigating the resulting forces and power, and I can also understand the value of allowing the bodies to be fully free in x, y, and their orientation angle to see if possible configurations spontaneously emerge from the flow interactions. But why constrain some degrees of freedom and not others? What's the motivation, and what's the relevance to animals, which are fully free?

      We would like to thank the referee for raising this point. It is similar to the point raised above by the first referee. As explained above the reason is the following: in freely-swimming, hydrodynamically-interacting “fish,” the lateral drift is sub-dominant to the forward swimming motion. Therefore, we ignore it in the model. Please see our detailed response above for further clarification, and see changes in page 4 in the main manuscript.

      (2) The model description in Eq. (1) and the surrounding text is confusing. Aren't the authors computing forces via CFD or the VS method and then simply driving the propulsive dynamics according to the net horizontal force? It seems then irrelevant to decompose things into thrust and drag, and it seems irrelevant to claim that the thrust comes from pressure and the drag from viscous effects. The latter claim may in fact be incorrect since the body has a shape and the normal and tangential components of the surface stress along the body may be complex.

      Thank you for pointing this out! It is indeed confusing.

      In the CFD simulations, we are computing the net force in the swimming x-direction direction by integrating using the definition of force density in relation to the stress tensor. There is no ambiguity here.

      In the VS simulations, however, we are computing the net force in the swimming x-direction by integrating the pressure jump across a plate of zero thickness. There is no viscous drag. Viscous drag is added by hand, so-to-speak. This method for adding viscous drag in the context of the VS model is not new, it has been used before in the literature as explained in the SI section “Vortex sheet (VS) model” (pages 30 and 31).

      .

      (3) The parameter taudiss in the VS simulations takes on unusual values such as 2.45T, making it seem like this value is somehow very special, and perhaps 2.44 or 2.46 would lead to significantly different results. If the value is special, the authors should discuss and assess it. Otherwise, I recommend picking a round value, like 2 or 3, which would avoid distraction.

      Response: The choice of dissipation time is both to model viscous effect and reduce computational complexity. Introducing it is indeed introduces forcing to the simulation. Round value, like 2 or 3, is equal to an integer multiple of the flapping period, which is normalized to T=1, Therefore, an integer value of  would cause forcing at the resonant frequency and lead to computational blow up. To avoid this effect, a parameter choice of  = 2.45, 2.44 or 2.46 would be fine and would lead to small perturbation to the overall simulation, compared to no dissipation at all. This effect is studied in detail in the following published work from our group:

      Huang, Y., Ristroph, L., Luhar, M., & Kanso, E. (2018). Bistability in the rotational motion of rigid and flexible flyers. Journal of Fluid Mechanics849, 1043-1067. DOI: https://doi.org/10.1017/jfm.2018.446

      (4) Some of the COT plots/information were difficult to interpret because the correspondence of beneficial with the mathematical sign was changing. For example, DeltaCOT as introduced on p. 5 is such that negative indicates bad energetics as compared to a solo swimmer. But elsewhere, lower or more negative COT is good in terms of savings. Given the many plots, large amounts of data, and many quantities being assessed, the paper needs a highly uniform presentation to aid the reader.

      Thank you for pointing this out! We updated Figures 3,6 as suggested.

      (5) I didn't understand the value of the "flow agreement parameter," and I didn't understand the authors' interpretation of its significance. Firstly, it would help if this and all other quantities were given explicit definitions as complete equations (including normalization). As I understand it, the quantity indicates the match of the flow velocity at some location with the flapping velocity of a "ghost swimmer" at that location. This does not seem to be exactly relevant to the equilibrium locations. In particular, if the match were perfect, then the swimmer would generate no relative flow and thus no thrust, meaning such a location could not be an equilibrium. So, some degree of mismatch seems necessary. I believe such a mismatch is indeed present, but the plots such as those in Figure 4 may disguise the effect. The color bar is saturated to the point of essentially being three tones (blue, white, red), so we cannot see that the observed equilibria are likely between the max and min values of this parameter.

      Thank you for pointing this out! You are correct in your understanding of the flow agreement parameter, but not in your interpretation.

      Basically, “if the match were perfect, then the swimmer would generate no relative flow and thus no thrust,” means that “such a location could not be is an equilibrium.” Let me elaborate. An equilibrium is one at which the net thrust force is zero. The equilibrium is stable if the slope of the thrust force is negative. Ideally, this is what maximizing the flow agreement parameter would produce.

      For example, consider an ideal fluid where the flow velocity is form  in vertical direction. Consider a “ghost swimmer” heaving at a velocity  . Under this scenario, flow agreement and thrust parameters are

      Let’s now consider a balance of forces on the “ghost swimmer.” The ghost swimmer is in relative equilibrium if and only if:

      It gives us

      We then consider stability at this equilibrium by calculating the derivative of thrust parameter over phase

      The corresponding values at equilibria are

      Thus, when taking the positive which means the equilibria is a stable fixed point. We included this analysis in a new section in the SI page 32.

      (6) More generally, and related to the above, I am favorable towards the authors' attempts to find approximate flow metrics that could be used to predict the equilibrium positions and their stability, but I think the reasoning needs to be more solid. It seems the authors are seeking a parameter that can indicate equilibrium and another that can indicate stability. Can they clearly lay out the motivation behind any proposed metrics, and clearly present complete equations for their definitions? Further, is there a related power metric that can be appropriately defined and which proves to be useful?

      Thank you – these are excellent suggestions. Indeed, we needed to better explain the motivation and equations. Perhaps the main idea for these metrics can be best understood when explained in the context of the simpler particle model, which we now do in the SI and explain the main text.

      (7) Why do the authors not carry out CFD simulations on the larger groups? Some explanations should be given, or some corresponding CFD simulations should be carried out. It would be interesting if CFD simulations were done and included, especially for the in-line case of many swimmers. This is because the results seem to be quite nuanced and dependent on many-body effects beyond nearest-neighbor interactions. It would certainly be comforting to see something similar happen in CFD.

      We are using a open-source version of the Immersed Boundary Method that is not specifically optimized for many interacting swimmers. Therefore, the computational cost of performing CFD simulations for more swimmers is high. Therefore, we used the CFD simulations sporadically with fewer simmers (2 or 3) and we performed systematic simulations in the context of the VS model.

      For the same Reynolds number in Figure 1, we simulated three and four swimmers in CFD: three swimmers forms a stable formation, four swimmers don’t, consistent with the VS model, with the forth swimmer colliding with the third one. Results are included in the SI figure 8 of the main text.

      (8) Related to the above, the authors should discuss seemingly significant differences in their results for long in-line formations as compared to the CFD work of Peng et al. [48]. That work showed apparently stable groups for numbers of swimmers quite larger than that studied here. Why such a qualitatively different result, and how should we interpret these differences regarding the more general issue of the stability of tandem groups?

      Thank you for bringing up this important comparison. Peng et al. [48] (Hydrodynamic schooling of multiple self-propelled flapping plates) studied inline configuration of flapping airfoils at Reynolds number =200. There are several differences between their work and ours. The most important one is that they used a flexible plate, which makes the swimmer more adaptive to changes in the flow field, e.g. changes in tailbeat amplitude and changes in phase along its body and diverts some of the hydrodynamic energy to elastic energy. We edited the main text page 10 at the end of section “Critical size of inline formations beyond which cohesion is lost” to explain this distinction.

      (9) The authors seem to have all the tools needed to address the general question about how dynamically stable configurations relate to those that are energetically optimal. Are stable solutions optimal, or not? This would seem to have very important implications for animal groups, and the work addresses closely related topics but seems to miss the opportunity to give a definitive answer to this big question.

      Indeed, that is exactly the point – in pairwise formations, stable configurations are also energetically optimal! In larger groups, there is no unique stable configuration – each stable configuration is associated with a different degree of energy savings. Interestingly, when exploring various equilibrium configurations in a school of four, we found the diamond formation of D. Weihs, Nature, 1972 to be both stable and most optimal among the configurations we tested. However, claiming this as a global optimum may be misleading – our standpoint is that fish schools are always dynamic and that there are opportunities for energy savings in more than one stable configuration.

      We added a section in new text “Mapping emergent spatial patterns to energetic benefits”, and added a new figure in the maintext (Fig. 10) and a new figure in the SI (Fig. S. 8)

      (10) Time-delay particle model: This model seems to construct a simplified wake flow. But does the constructed flow satisfy basic properties that we demand of any flow, such as being divergence-free? If not, then the formulation may be troublesome.

      The simplified wake flow captures the hydrodynamic trail left by the swimmer in a very simplified manner. In the limit of small amplitude, it should be consistent with the inviscid vortex sheet shed of T. Wu’s waving swimmer model (Wu TY. 1961).

      The model was compared to experiments and used in several recent publications from the Courant Institute (Newbolt et al. 2019, 2022, 2024).

      Citations:  

      Wu, T. Y. T. (1961). Swimming of a waving plateJournal of Fluid Mechanics10(3), 321-344. DOI: https://doi.org/10.1017/S0022112061000949

      Newbolt, J. W., Lewis, N., Bleu, M., Wu, J., Mavroyiakoumou, C., Ramananarivo, S., & Ristroph, L. (2024). Flow interactions lead to self-organized flight formations disrupted by self-amplifying wavesNature Communications15(1), 3462. DOI: https://doi.org/10.1038/s41467-024-47525-9

      Newbolt, J. W., Zhang, J., & Ristroph, L. (2022). Lateral flow interactions enhance speed and stabilize formations of flapping swimmersPhysical Review Fluids7(6), L061101. DOI: https://doi.org/10.1103/PhysRevFluids.7.L061101

      Newbolt, J. W., Zhang, J., & Ristroph, L. (2019). Flow interactions between uncoordinated flapping swimmers give rise to group cohesionProceedings of the National Academy of Sciences116(7), 2419-2424.  DOI: https://doi.org/10.1073/pnas.1816098116

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on such a comprehensive and well-thought-out study; I truly enjoyed reading it and have only a couple of suggestions that I believe will help further strengthen the paper. I am including a bunch of references here that are very familiar to me without the expectation of you to include them all, just to point at areas that I feel you might consider useful.

      We thank the referee again for their careful read of the manuscript and for their constructive feedback. We appreciate it.

      First, I believe that some more rationale is needed to justify the chosen modeling framework. I am fully aware of how difficult is to run these simulations, but I see some critical assumptions that need to be at least spelled out for the reader to appreciate the limitations of the study: (1) Constraining the cross-stream coordinate (a stability analysis should include perturbations on the cross-stream coordinate as well, see, for example, https://doi.org/10.1017/flo.2023.25 -- I know this is much simpler as it discards any vortex shedding) and (2) Assuming equal frequency and amplitude (there are studies showing variation of tail beat frequency in animals depending on their position in the school, see, for example, https://doi.org/10.1007/s00265-014-1834-4).

      Thank you for these suggestions. These are indeed important and interesting points to discuss in the manuscript. See response above regarding point 1. Regarding point 2, this is of course important and will be pursued in future extensions of this work. We edited the intro and discussion of the main text to explain this.

      In the paper “Stability of schooling patterns of a fish pair swimming against a flow”, The authors considered a pair of swimmers swimming in a channel. They analyzed stability of the system and find multiple equilibria of the system, including inline and staggered formation, and a special formation of perpendicular to the wall. Studying fish school in confined domain and analyzing their stability is very interesting. We added citation to this paper in the discussion section at the end of page 10.

      In the paper “Fish swimming in schools save energy regardless of their spatial position”, the authors measured the reduction in power of fish by measuring tail beat frequency and oxygen consumption and compared them to measurements in solitary fish. They found that in a school of fish, individuals always save power comparing to swimming alone.  However, there is one important caveat in this study: they considered a larger school of fish and expressed the results in terms of pairwise configurations (see schematics we draw below). This is misleading because it may suggest that formations with only two fish provide benefits each other, while in fact, the data is obtained from a larger school with many neighbors. They only consider a fish’s relationship to its nearest neighbor. But in a large school, other neighbors will also have influence on their energy consumption.  In the schematics below, we emphasized on several focal fishes, marking them as red, green, and blue. We also marked their nearest neighbors using the same color, but lighter. The nearest neighbors are what the authors are considering to show its neighbor relationship. For example, a problematic one is the red fish, for which its nearest neighbor is behind it, but indeed, its power saving may come from the other neighbors, which are around or ahead it.

      Author response image 3.

      Second, I would like to see more biology context with respect to limitations that are inherent to a purely mechanical model, including, neglecting vision that we know plays a synergistic role in determining schooling patterns. For example, a recent study https://doi.org/10.1016/j.beproc.2022.104767 has presented experiments on fish swimming in the dark and in bright conditions, showing that it is unlikely that hydrodynamics alone could explain typically observed swimming patterns in the literature.

      Thank you for this suggestion and for sharing us with the paper “Collective response of fish to combined manipulations of illumination and flow”. This is a great study, and we are sorry to have missed it.

      In this paper, the authors found that when having illumination, fish swim more cohesively, which is in consistent with another paper we already cited “The sensory basis of schooling by intermittent swimming in the rummy-nose tetra (Hemigrammus rhodostomus)”. Another important conclusion in this paper is that when having brighter illumination and with flow, fish school spend more time side by side. This connects well to the conclusion in another paper we cited “Simple phalanx pattern leads to energy saving in cohesive fish schooling,” where at lower flow speed in a water channel, fish tended to form a dynamic school while at higher flow speed, they organized in a side-by-side/ phalanx configuration. This conclusion is consistent with our study that in side-by-side formation, fish share power saving.

      Importantly, it is well known that both vision and flow sensing play important roles in fish schooling. This study aimed to merely explore what is possible through passive hydrodynamic interactions, without visual and flow sensing and response. We clarify this in the revised version of the manuscript.

      Third, I am not too convinced about the flow agreement metric, which only accounts for linear interactions between the foils. More sophisticated approaches could be utilized as the one proposed here https://doi.org/10.1017/jfm.2018.369, based on a truly model-agnostic view of the interaction - therein, the authors show non-reciprocal (in strength and time-scale) coupling between two in-line flapping foils using information theory. I also would like to mention this older paper https://doi.org/10.1098/rsif.2012.0084, where an equivalent argument about the positioning of a trailing fish with respect to a leading robotic fish is made from experimental observations.

      Thank you for these remarks and for sharing these two interesting papers.

      The flow agreement metric is not specific to two fish, as we show in Fig. 6 of the manuscript. We edited the manuscript and SI to better explain the motivation and implementation of the flow agreement parameter. We edited the main text, see revisions on page 7, and added a new section call “diagnostic tools.”.

      In the paper “An information-theoretic approach to study fluid–structure interactions”, the authors calculate the transfer entropy between two oscillating airfoils when they are hydrodynamically coupled.  This is an interesting study! We will apply this approach to analyzing larger schools in the future. We cited this paper in the introduction.

      In the paper “Fish and robots swimming together: attraction towards the robot demands biomimetic locomotion”, the authors found that fish will swim behind an artificial fish robot, especially when the fish robot is beating its tail instead of static. At specific conditions, the fish hold station behind the robot, which may be due to the hydrodynamic advantage obtained by swimming in the robot’s wake. DPIV resolved the wake behind a static/ beating fish robot, but did not visualize the flow field when the fish is there. This study is similar to a paper we already cited “In-line swimming dynamics revealed by fish interacting with a robotic mechanism”, in which, they considered fish-foil interaction. In the revised manuscript, we cite both papers.

      For the reviewer’s comments about flow agreement only accounts for linear interactions between the foils, we want to explain more to clarify this. The flow agreement parameter is a nonlinear metric, which considered the interaction between a virtual swimmer and an arbitrary unsteady flow field. Although the metric is a linear function of swimmer’s speed, it is indeed a nonlinear function of spacing and phase, which are the quantities we care about. Moreover, the flow field can by generated by either experiment or CFD simulation, and behind one or more swimmers. It is true that it is a one way coupled system since the virtual swimmer does not perturb the flow field.

      Again, this is great work and I hope these suggestions are of help.

      Thank you again! We are delighted to receive such a positive and constructive feedback.

      Reviewer #2 (Recommendations For The Authors):

      (1) About Figure 1: Panel C should be made to match between CFD and VS with regard to the swimmer positions. Also, if the general goal of the figure is to compare CFD and VS, then how about showing a difference map of the velocity fields as a third column of panels across A-D?

      Thank you for pointing this out. Figure 1 C is updated accordingly.

      The general goal is to show the CFD and VS simulations produce qualitatively similar results. Some quantities are not the same across models, e.g. the swimming speed of swimmers are different, but the scaled distance is the same.

      (2) Figure 3: In A, it would be nice to keep the y-axis the same across all plots, which would aid quick visual comparison. In B, the legend labels for CFD and VS should be filled in with color so that the reader can more easily connect to the markers in the plot.

      Thank you for pointing this out, we’ve updated figure 3 and 6.

      (3) Figures 4, 9, and Supplementary Figures too: As mentioned previously, the agreement parameter plots are saturated in the color map, possibly obscuring more detailed information.

      Thank you for pointing this out. The goal is to show that there is a large region with positive flow agreement parameter.

      We picked up the flow agreement behind a single swimmer in VS simulation (Fig.4B) and added the counter lines to it (represents 0.25 and 0.5).  Not many details are hidden by the saturated colormap.

      Author response image 4.

      We also updated Fig 4 and Fig 9 accordingly.

      (4) Figure 6: Is this CFD or VS? Why show one or the other and not both? In B, it seems that there are only savings available and no energetically costly positions. This seems odd. In C, it seems the absolute value on dF/dd is suppressing some important information about stability - the sign of this seems important. In E, the color bar seems to be reflected from what is standard, i.e. 0 on the left and 100 on the right, as in F.

      Thank you for asking. Fig. 6 is based only on VS simulations. There are hundreds of simulations in this figure, we are not running CFD simulations to save computational effort. Representative CFD simulations are shown in Figure 1,2,3, for comparison. We added a sentence in the figure caption for clarification.

      In C, since  is always negative for emergent formations (only stable equilibria can appear during forward time simulation), we are showing its absolute value for comparison.

      In E, we are flipping this because larger flow agreement parameter corresponds to more power saving, in the other word, negative changes in COT.

      (5) Fig. 8: For cases such as in D that have >100% power savings, does this mean that the swimmer has work done by the flow? How to interpret this physically for a flapping foil and biologically for a fish?

      Yes, it means the hydrofoil/fish gets a free ride, and even able to harvest energy from the incoming flow. Actually, similar phenomenon has been reported in the biology and engineering literature. For example, Liao et al. 2003, Beal et al. 2006 found that live or dead fish can harvest energy from incoming vortical flow by modulating their body curvature.

      In engineering, Chen et al. 2018, Ribeiro et al. 2021 have found that the following airfoil in a tandem/ inline formation can harvest energy from the wake of leading swimmer in both simulation and experiemnts.

      Citations:  

      Liao, J. C., Beal, D. N., Lauder, G. V., & Triantafyllou, M. S. (2003). Fish exploiting vortices decrease muscle activityScience302(5650), 1566-1569. DOI: https://doi.org/10.1126/science.1088295

      Beal, D. N., Hover, F. S., Triantafyllou, M. S., Liao, J. C., & Lauder, G. V. (2006). Passive propulsion in vortex wakesJournal of fluid mechanics549, 385-402. DOI: https://doi.org/10.1017/S0022112005007925

      Chen, Y., Nan, J., & Wu, J. (2018). Wake effect on a semi-active flapping foil based energy harvester by a rotating foilComputers & Fluids160, 51-63. DOI: https://doi.org/10.1016/j.compfluid.2017.10.024

      Ribeiro, B. L. R., Su, Y., Guillaumin, Q., Breuer, K. S., & Franck, J. A. (2021). Wake-foil interactions and energy harvesting efficiency in tandem oscillating foilsPhysical Review Fluids6(7), 074703. DOI: https://doi.org/10.1103/PhysRevFluids.6.074703

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews

      Reviewer 1 summarized that: In this revised version of the manuscript, the authors have made important modifications in the text, inserted new data analyses, and incorporated additional references, as recommended by the reviewers. These modifications have significantly improved the quality of the manuscript.

      We are grateful for the reviewer's positive recognition of our revisions.

      Reviewer 2 noted that:

      (1) The authors do not show if the PVT mediates dPAG to BLA communication with any functional behavioral assay.

      We appreciate the reviewer’s suggestion to include a functional assay to investigate the role of the PVT in mediating communication between the dPAG and BLA. Our primary objective was to confirm the upstream role of the dPAG in processing and relaying naturalistic predatory threat information to the BLA, thereby broadening our current understanding of the dPAG-BLA relationship based on Pavlovian fear conditioning paradigms.

      Given previous anatomical findings indicating the absence of direct monosynaptic projections from the dPAG to the BLA (Cameron et al. 1995, McNally, Johansen, and Blair 2011, Vianna and Brandao 2003), we employed both anterograde and retrograde tracers, supplemented by c-Fos expression analysis following predatory threats, to explore possible routes through which threat signals may be conveyed from the dPAG to the BLA. Our findings indicated significant activity within the midline thalamic regions, particularly the PVT as a mediator of dPAG-BLA interactions, corroborating the possibility of dPAGàBLA information flow.

      Investigating the PVT's functional role appropriately would require single-unit recordings, correlation analysis of PVT neuronal responses with dPAG and BLA neuronal responses, and pathway-specific causal techniques, involving other midline thalamic regions for controls. This comprehensive study would represent an independent study.

      In response to previous feedback, we have carefully revised our manuscript to moderate the emphasis on the PVT's role. Both the Abstract, Results, and Discussion refer more broadly to "midline thalamic regions" and “The midline thalamus” (subheading) rather than specifically to the PVT. In the Introduction, we mention that the PVT "may be part of a network that conveys predatory threat information from the dPAG to the BLA." Our conclusions about the functional interaction between the dPAG and BLA, which broaden the view of Pavlovian fear conditioning, are not contingent on confirming a specific intermediary role for the PVT.

      (2) The author also do not thoroughly characterize the activity of BLA cells during the predatory assay.

      Our previous studies have extensively detailed BLA cell firing characteristics, including their responsiveness to food and/or a robot predator during the predatory assay (Kim et al. 2018, Kong et al. 2021), and compared these findings to other predator studies (Amir et al. 2019, Amir et al. 2015). In the current study, out of 85 BLA cells, 3 were food-specific and 4 responded to both the pellet and the robot, with none of these 7 cells responding to dPAG stimulation.

      Given our earlier findings of the immediate responses of BLA neurons to robot activation, we specifically examined whether robot-responsive BLA neurons receive signals from the dPAG. For this analysis, we excluded all food-related cells (pellet cells and BOTH cells) and focused on the time window immediately after robot activation (within 500 ms after robot onset). This approach enabled us to avoid potential confounds from residual effects of robot-induced immediate BLA responses during the animals’ flight and nest entry behaviors.

      Furthermore, as previously described, the robot is programmed to move forward a fixed distance and then return, repeatedly triggering foraging behavior. This setup facilitates the analysis of neural changes during food approach and predator avoidance conflicts. However, animals quickly adapt to the robot, reducing freezing and stretch-attend behaviors, making time-stamped analysis of these behaviors unfeasible.

      We would like to highlight that the present study explicitly focused on demonstrating whether BLA neurons that responded to intrinsic dPAG optogenetic stimulation also responded to extrinsic predatory robot activation, and compared their firing characteristics to those BLA neurons that did not respond to dPAG stimulation (Figure 3). This targeted analysis provides insights into the responsiveness of BLA neurons to both intrinsic and extrinsic stimuli, furthering our understanding of the dPAG-BLA interaction in the context of predatory threats.

      Reviewer 3 also raised no concerns and stated that: The series of experiments provide a compelling case for supporting their conclusions. The study brings important concepts revealing dynamics of fear-related circuits particularly attractive to a broad audience, from basic scientists interested in neural circuits to psychiatrists.

      We sincerely thank the reviewer for the positive feedback on our revisions.

      Recommendations for the Authors

      Reviewer 1: There are a few minor concerns that the authors may want to fix:

      (1) Point 5) The sentence: "The complexity of targeting the dPAG, which includes its dorsomedial, dorsolateral, lateral, and ventrolateral subdivisions" is hard to follow because the ventrolateral subdivision is not part of the dPAG. The authors may want to say specific subregions of the PAG instead. It is also unclear why transgenic animals would be needed for this projection-defined manipulations. The combination of retrograde Cre-recombinase virus with inhibitory opsin or chemogenetic approach may be sufficient.

      We appreciate the reviewer’s insightful feedback regarding our description of the dPAG and the use of transgenic mice in future studies. As suggested, we have corrected the manuscript to exclude the 'ventrolateral' subdivision from the dPAG description, now accurately aligning with pioneering studies (Bandler, Carrive, and Zhang 1991, Bandler and Keay 1996, Carrive 1993) that designated dPAG as including the dorsomedial (dmPAG), dorsolateral (dlPAG) and lateral (lPAG) regions, as cited in our revised manuscript.

      We acknowledge the reviewer’s helpful suggestion regarding the use of retrograde Cre-recombinase virus with inhibitory opsins or chemogenetic approaches as viable alternatives. These methods have been incorporated into our discussion (pages 14-15): “While our findings demonstrate that opto-stimulation of the dPAG is sufficient to trigger both fleeing behavior and increased BLA activity, we have not established that the dPAG-PVT circuit is necessary for the BLA’s response to predatory threats. To establish causality and interregional relationships, future studies should employ methods such as pathway-specific optogenetic inhibition (using retrograde Cre-recombinase virus with inhibitory opsins; Lavoie and Liu 2020, Li et al. 2016, Senn et al. 2014) or chemogenetics (Boender et al. 2014, Roth 2016) in conjunction with single unit recordings to fully characterize the dPAG-PVT-BLA circuitry’s (as opposed to other midline thalamic regions for controls) role in processing predatory threat-induced escape behavior. If inactivating the dPAG-PVT circuits reduces the BLA's response to threats, this would highlight the central role of the dPAG-PVT pathway in this defense mechanism. Conversely, if the BLA's response remains unchanged despite dPAG-PVT inactivation, it could suggest the existence of multiple pathways for antipredatory defenses.”

      This revision addresses the critique by clarifying the anatomical description of the dPAG and emphasizing the feasibility of using targeted viral approaches without the necessity for transgenic animals.

      (2) Point 6e) The authors mentioned that "pellet retrieval" was indicated by the animal entering a designated zone 19 cm from the pellet, driven by hunger. Entering the area 19cm of distance should be labeled as food approaching rather then food retrieval because in many occasions the animals may be some seconds away of grabbing the pellet.

      We agree and incorporate the change (pg. 22).

      (3) Point 11) We would strongly recommend the authors to replace the terminology "looming" by "approaching" to avoid confusion with several previous studies looking at defensive behaviors in responses to looming induced by the shadow of an object moving closer to the eyes.

      Done.

      (4) Point 17) The authors mentioned that "A total of three rats were utilized for the robot testing experiments depicted in Fig. 2 G-J." However, the figure indicates a total of 9 ChR2 and 4 controls.

      We apologize for the confusion in our previous author responses. To examine the optical stimulation effects on behavior in Fig. 2G-J, we used a total of 9 ChR2 and 4 EYFP rats. The experimental sequence is detailed in the previously revised manuscript (pg. 20): “For optical stimulation and behavioral experiments, the procedure included 3 baseline trials with the pellet placed 75 cm away, followed by 3 dPAG stimulation trials with the pellet locations sequentially set at 75 cm, 50 cm, and 25 cm. During each approach to the pellet, rats received 473-nm light stimulation (1-2 s, 20-Hz, 10-ms width, 1-3 mW) through a laser (Opto Engine LLC) and a pulse generator (Master-8; A.M.P.I.). Additional testing to examine the functional response curves was conducted over multiple days, with incremental adjustments to the stimulation parameters (intensity, frequency, duration) after confirming that normal baseline foraging behavior was maintained. For these tests, one parameter was adjusted incrementally while the others were held constant (intensity curve at 20 Hz, 2 s; frequency curve at 3 mW, 2 s; duration curve at 20 Hz, 3 mW). If the rat failed to procure the pellet within 3 min, the gate was closed, and the trial was concluded.”

      This clarification ensures that the actual number of animals used is accurately reflected and aligns with the figure data, addressing the reviewer's concern.

      Reviewer 2: The authors made important changes in the text to address study limitations, including citations requested by the Reviewers and additional discussions about how this work fits into the existing literature. These changes have strengthened the manuscript.

      (1) However, the authors did not perform new experiments to address any of the issues raised in the previous round of reviews. For example, they did not make optogenetic manipulations of the pathway including the PVT, and did not add any loss of function experiments. The justification that these experiments are better suited for future reports using mice is not convincing, because hundreds of papers performing these types of circuit dissection assays have been performed in rats.

      We appreciate the reviewer's comments regarding the experimental scope of our study. Our study’s primary objective was to explore the dPAG’s upstream functional role in processing and conveying naturalistic predatory threat information to the BLA, extending our current understanding of the dPAG-BLA relationship based on Pavlovian fear conditioning paradigms. We believe that our findings effectively address this goal.

      Our use of anterograde and retrograde tracers, supplemented by c-Fos expression analysis in response to predatory threats, was primarily conducted to verify the possibility of the dPAGàBLA information flow during predator encounters. This involved exploring potential routes through which threat signals might be conveyed from the dPAG to the BLA, given the lack of direct monosynaptic projections from the dPAG to BLA neurons (Cameron et al. 1995, McNally, Johansen, and Blair 2011, Vianna and Brandao 2003). This methodology helped us identify a potential structure, PVT, for more in-depth future studies. A thorough examination of the PVT's role would require single-unit recordings and causal techniques, incorporating other midline thalamic regions as controls, representing a significant and separate study on its own.

      In response to prior feedback, we have carefully revised our manuscript to generally address the role of "midline thalamic regions" rather than focusing specifically on the PVT. We wish to emphasize that our findings, which illustrate unique functional interactions between the dPAG and BLA in response to a predatory imminence, remain compelling and informative even without definitive evidence of the PVT’s involvement.

      Reviewer 3: In the revised version of the manuscript, the authors addressed adequately all the concerns raised by the reviewers. 

      We thank the reviewer for the thoughtful feedback on the earlier version of our manuscript and for reexamining the revisions we have made.

      References

      Amir, A., P. Kyriazi, S. C. Lee, D. B. Headley, and D. Pare. 2019. "Basolateral amygdala neurons are activated during threat expectation." J Neurophysiol 121 (5):1761-1777.

      Amir, A., S. C. Lee, D. B. Headley, M. M. Herzallah, and D. Pare. 2015. "Amygdala Signaling during Foraging in a Hazardous Environment." J Neurosci 35 (38):12994-3005.

      Bandler, R., P. Carrive, and S. P. Zhang. 1991. "Integration of somatic and autonomic reactions within the midbrain periaqueductal grey: viscerotopic, somatotopic and functional organization." Prog Brain Res 87:269-305.

      Bandler, R., and K. A. Keay. 1996. "Columnar organization in the midbrain periaqueductal gray and the integration of emotional expression." Prog Brain Res 107:285-300.

      Boender, A. J., J. W. de Jong, L. Boekhoudt, M. C. Luijendijk, G. van der Plasse, and R. A. Adan. 2014. "Combined use of the canine adenovirus-2 and DREADD-technology to activate specific neural pathways in vivo." PLoS One 9 (4):e95392.

      Cameron, A. A., I. A. Khan, K. N. Westlund, and W. D. Willis. 1995. "The efferent projections of the periaqueductal gray in the rat: a Phaseolus vulgaris-leucoagglutinin study. II. Descending projections." J Comp Neurol 351 (4):585-601.

      Carrive, P. 1993. "The periaqueductal gray and defensive behavior: functional representation and neuronal organization." Behav Brain Res 58 (1-2):27-47.

      Kim, E. J., M. S. Kong, S. G. Park, S. J. Y. Mizumori, J. Cho, and J. J. Kim. 2018. "Dynamic coding of predatory information between the prelimbic cortex and lateral amygdala in foraging rats." Sci Adv 4 (4):eaar7328.

      Kong, M. S., E. J. Kim, S. Park, L. S. Zweifel, Y. Huh, J. Cho, and J. J. Kim. 2021. "'Fearful-place' coding in the amygdala-hippocampal network." Elife 10.

      Lavoie, A., and B. H. Liu. 2020. "Canine Adenovirus 2: A Natural Choice for Brain Circuit Dissection." Front Mol Neurosci 13:9.

      Li, Y., L. Hickey, R. Perrins, E. Werlen, A. A. Patel, S. Hirschberg, M. W. Jones, S. Salinas, E. J. Kremer, and A. E. Pickering. 2016. "Retrograde optogenetic characterization of the pontospinal module of the locus coeruleus with a canine adenoviral vector." Brain Res 1641 (Pt B):274-90.

      McNally, G. P., J. P. Johansen, and H. T. Blair. 2011. "Placing prediction into the fear circuit."  Trends Neurosci 34 (6):283-92.

      Roth, B. L. 2016. "DREADDs for Neuroscientists." Neuron 89 (4):683-94.

      Senn, V., S. B. Wolff, C. Herry, F. Grenier, I. Ehrlich, J. Grundemann, J. P. Fadok, C. Muller, J. J. Letzkus, and A. Luthi. 2014. "Long-range connectivity defines behavioral specificity of amygdala neurons." Neuron 81 (2):428-37.

      Vianna, D. M., and M. L. Brandao. 2003. "Anatomical connections of the periaqueductal gray: specific neural substrates for different kinds of fear." Braz J Med Biol Res 36 (5):557-66.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The author presents the discovery and characterization of CAPSL as a potential gene linked to Familial Exudative Vitreoretinopathy (FEVR), identifying one nonsense and one missense mutation within CAPSL in two distinct patient families afflicted by FEVR. Cell transfection assays suggest that the missense mutation adversely affects protein levels when overexpressed in cell cultures. Furthermore, conditionally knocking out CAPSL in vascular endothelial cells leads to compromised vascular development. The suppression of CAPSL in human retinal microvascular endothelial cells results in hindered tube formation, a decrease in cell proliferation, and disrupted cell polarity. Additionally, transcriptomic and proteomic profiling of these cells indicates alterations in the MYC pathway. 

      Strengths: 

      The study is nicely designed with a combination of in vivo and in vitro approaches, and the experimental results are good quality. 

      We thank the reviewer for the conclusion and positive comments.

      Weaknesses: 

      My reservations lie with the main assertion that CAPSL is associated with FEVR, as the genetic evidence from human studies appears relatively weak. Further careful examination of human genetics evidence in both patient cohorts and the general population will help to clarify. In light of human genetics, more caution needs to be exercised when interpreting results from mice and cell models and how is it related to the human patient phenotype. 

      We thank the reviewer for careful reading and constructive suggestion. we added several experiments to address the concern of reviewer are as follows:

      (1) The pLI score of LOF allele of CAPSL is based of general population, among which Europeans account for ~77% and East Asians make up less than 3%. Since the FEVR families in this article all come from China, the pLI score may not be accurate. Of course, we will continue to collect FEVR pedigrees.

      (2) We evaluated the phenotype of Capsl heterozygous mice at P5, and the results showed no overt difference in vascular progression, vessel density and branchpoints with littermate wildtype controls (Fig.S4). The lack of pronounced phenotype in FEVR heterozygous mice may be due to different sensitivity between human and mice. A similar example is LRP5 mutations associated with FEVR. Heterozygous mutations in LRP5 were reported in FEVR patients in multiple populations (PMID: 16929062, 33302760, 27486893, 35918671, 36411543). However, heterozygous Lrp5 knockout mice exhibited no visible angiogenic phenotype (PMID: 18263894). Corresponding description was added in the manuscript at page 6.

      (3) We further assessed the angiogenic phenotype when angiogenesis almost complete at P21, and the resulted revealed no difference observed between Ctrl and CapsliECKO/iECKO mice (Fig.S5). And corresponding description was added in the manuscript at page 7.

      (4) We evaluated the expression of MYC downstream genes in vivo using lung tissue form P35 Ctrl and _Capsl_iECKO/iECKO mice (Fig.S8). Consistent with the results from in vitro HRECs, _Capsl_iECKO/iECKO mice showed downregulated expression of MYC targets. And corresponding description was added in the manuscript at page 11.

      Reviewer #2 (Public Review): 

      Summary: 

      This work identifies two variants in CAPSL in two-generation familial exudative vitreoretinopathy (FEVR) pedigrees, and using a knockout mouse model, they link CAPSL to retinal vascular development and endothelial proliferation. Together, these findings suggest that the identified variants may be causative and that CAPSL is a new FEVR-associated gene. 

      Strengths: 

      The authors' data provides compelling evidence that loss of the poorly understood protein CAPSL can lead to reduced endothelial proliferation in mouse retina and suppression of MYC signaling in vitro, consistent with the disease seen in FEVR patients. The study is important, providing new potential targets and mechanisms for this poorly understood disease. The paper is clearly written, and the data generally support the author's hypotheses. 

      We thank the reviewer for the conclusion and positive comments.

      Weaknesses: 

      (1) Both pedigrees described appear to suggest that heterozygosity is sufficient to cause disease, but authors have not explored the phenotype of Capsl heterozygous mice. Do these animals have reduced angiogenesis similar to KOs? Furthermore, while the p.R30X variant protein does not appear to be expressed in vitro, a substantial amount of p.L83F was detectable by western blot and appeared to be at the normal molecular weight. Given that the full knockout mouse phenotype is comparatively mild, it is unclear whether this modest reduction in protein expression would be sufficient to cause FEVR - especially as the affected individuals still have one healthy copy of the gene. Additional studies are needed to determine if these variants alter protein trafficking or localization in addition to expression, and if they can act in a dominant negative fashion. 

      We thank the reviewer for the suggestion. We evaluated the phenotype of Capsl heterozygous mice at P5 (Fig.S4), and the results showed no overt difference in angiogenesis compared with littermate control mice.

      We transfected CAPSL wild-type plasmid, p.R30X mutant plasmid and p.L83F mutant plasmid into 293T cells to assess the intracellular localization change of CAPSL mutant proteins (Fig.S1). The result showed that the point mutation did not affect the localization of the mutated protein, and corresponding description was added in the manuscript at page 5.

      (2) The manuscript nicely shows that loss of CAPSL leads to suppressed MYC signaling in vitro. However, given that endothelial MYC is regulated by numerous pathways and proteins, including FOXO1, VEGFR2, ERK, and Notch, and reduced MYC signaling is generally associated with reduced endothelial proliferation, this finding provides little insight into the mechanism of CAPSL in regulating endothelial proliferation. It would be helpful to explore the status of these other pathways in knockdown cells but as the authors provide only GSEA results and not the underlying data behind their RNA seq results, it is difficult for the reader to understand the full phenotype. Volcano plots or similar representations of the underlying expression data in Figures 6 and 7 as well as supplemental datasets showing the differentially regulated genes should be included. In addition, while the paper beautifully characterizes the delayed retinal angiogenesis phenotype in CAPSL knockout mice, the authors do not return to that model to confirm their in vitro findings. 

      We thank the reviewer for the suggestion. Although endothelial MYC can be regulated by FOXO1, VEGFR2, ERK, and Notch signaling pathway, these pathways are not enriched in the RNA seq data of CAPSL-depleted HRECs. This suggests that the down regulated MYC targets may not be influenced by the signaling pathway mentioned above. RNA-seq raw data have been uploaded to the Genome Sequence Archive (https://ngdc.cncb.ac.cn/gsa/browse/HRA010305) and proteomic profiling raw data have been uploaded to the Genome Sequence Archive (https://www.ebi.ac.uk/pride/archive), and the assigned accession number was PXD051696. Corresponding description was added in the manuscript at page 20-21. The datasets represent the differentially regulated genes in Figure 6 and 7 were listed at Dataset S1 and S2.

      (3) In Figure S2D, the result of this vascular leak experiment is unconvincing as no dye can be seen in the vessels. What are the kinetics for biocytin tracers to enter the bloodstream after IP injection? Why did the authors choose the IP instead of the IV route for this experiment? Differences in the uptake of the eye after IP injection could confound the results, especially in the context of a model with vascular dysfunction as here. 

      We thank the reviewer for suggestion. In Figure S2D (now Fig.S6D), we used a non-representative image to show vascular leakage. We replaced the images with more representative ones. We are sorry that we are not clear about the kinetics for biocytin tracers to enter the bloodstream after IP injection. Since the experiment was carried out on mice at P5, it is not feasible to do IV injection in P5 neonatal mice. We followed the methods described in the previous study involving mice of same age (PMID:35361685).

      (4) In Figure 5, it is unclear how filipodia and tip cells were identified and selected for quantification. The panels do not include nuclear or tip cell-specific markers that would allow quantification of individual tip cells, and in Figure 5C it appears that some filipodia are not highlighted in the mutant panel. 

      We thank the reviewer for the comments. In Figure 5, we used HRECs to examine the cell proliferation, migration and polarity in vitro, and therefore there is no distinction between tip cells and stalk cells. The quantification of filopodia/lamellipodia was performed as previous studies (PMID: 30783090, PMID: 28805663). In briefly, wound scratch was performed on confluent layers of transfected HRECs, and 9 hours after initiating cell migration by scratch, cells were fixed and stained with phalloidin. Cells at the edge of wound were considered as leader cells and quantified for number of filopodia/lamellipodia.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript by Liu et al. presents a case that CAPSL mutations are a cause of familial exudative vitreoretinopathy (FEVR). Attention was initially focused on the CAPSL gene from whole exome sequence analysis of two small families. The follow-up analyses included studies in which CAPSL was manipulated in endothelial cells of mice and multiple iterations of molecular and cellular analyses. Together, the data show that CAPSL influences endothelial cell proliferation and migration. Molecularly, transcriptomic and proteomic analyses suggest that CAPSL influences many genes/proteins that are also downstream targets of MYC and may be important to the mechanisms. 

      Strengths: 

      This multi-pronged approach found a previously unknown function for CAPSLs in endothelial cells and pointed at MYC pathways as high-quality candidates in the mechanism. 

      Weaknesses: 

      Two issues shape the overall impact for me. First, the unreported population frequency of the variants in the manuscript makes it unclear if CAPSL should be considered an interesting candidate possibly contributing to FEVR, or possibly a cause. Second, it is unclear if the identified variants act dominantly, as indicated in the pedigrees. The studies in mice utilized homozygotes for an endothelial cell-specific knockout, leaving uncertainty about what phenotypes might be observed if mice heterozygous for a ubiquitous knockout had instead been studied. 

      In my opinion, the following scientific issues are specific weaknesses that should be addressed: 

      (1) Please state in the manuscript the number of FEVR families that were studied by WES. Please also describe if the families had been selected for the absence of known mutations, and/or what percentage lack known pathogenic variants. 

      We thank the reviewer for thoughtful comments. 120 FEVR families were studied by WES and we added corresponding description in the manuscript at page 4.

      (2) A better clinical description of family 3104 would enhance the manuscript, especially the father. It is unclear what "manifested with FEVR symptoms, according to the medical records" means. Was the father diagnosed with FEVR? If the father has some iteration of a mild case, please describe it in more detail. If the lack of clinical images in the figure is indicative of a lack of medical documentation, please note this in the manuscript. 

      We thank the reviewer for thoughtful comments. The father of family 3104 has also been identified as a carrier of this heterozygous variant, manifested with FEVR symptoms, according to the medical records. Nevertheless, clinical examination images are presently unavailable. We added corresponding description in the manuscript at page 5.

      (3) The TGA stop codon can in some instances also influence splicing (PMID: 38012313). Please add a bioinformatic assessment of splicing prediction to the assays and report its output in the manuscript. 

      We thank the reviewer for thoughtful comments. We predicted the splicing of c.88C>T variant of CAPSL using MaxEntScan (http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html) and SpliceTool (https://rddc.tsinghua-gd.org/ai) (Fig.S2). MaxEntScan and SpliceTool were used to predict the impact of TGA stop codon of c.88C>T variant on the formation of a cryptic donor splice site.

      (4) More details regarding utilizing a "loxp-flanked allele of CAPSL" are needed. Is this an existing allele, if so, what is the allele and citation? If new (as suggested by S1), the newly generated CAPSL mutant mouse strain needs to be entered into the MGI database and assigned an official allele name - which should then be utilized in the manuscript and who generated the strain (presumably a core or company?) must be described. 

      We added detailed description of Capsl flxoed allele to Method section on page 14-15: “Capslloxp/+ model was generated using the CRISPR/Cas9 nickase technique by Viewsolid Biotechology (Beijing, China) in C57BL/6J background and named Capslem1zxj. The genomic RNA (gRNA) sequence was as follows: Capsl-L gRNA: 5’-CTATCCCAA TTGTGCTCCTGG-3’; Capsl-R gRNA: 5’-TGGGACTCATGGTTCTAGAGG-3’. ”

      (5) The statement in the methods "All mice used in the study were on a C57BL/6J genetic background," should be better defined. Was the new allele generated on a pure C57BL/6J genetic background, or bred to be some level of congenic? If congenic, to what generation? If unknown, please either test and report the homogeneity of the background, or consult with nomenclature experts (such as available through MGI) to adopt the appropriate F?+NX type designation. This also pertains to the Pdgfb-iCreER mice, which reference 43 describes as having been generated in an F2 population of C57BL/6 X CBA and did not designate the sub-strain of C57BL/6 mice. It is important because one of the explanations for missing heritability in FEVR may be a high level of dependence on genetic background. From the information in the current description, it is also not inherently obvious that the mice studied did not harbor confounding mutations such as rd1 or rd8. 

      We thank the reviewer for suggestion. We added the following description to “Mouse model and genotyping” method section on page 14. “Capslloxp/+ model was generated using the CRISPR/Cas9 nickase technique by Viewsolid Biotechology (Beijing, China) in C57BL/6J background and named Capslem1zxj. The genomic RNA (gRNA) sequence was as follows: Capsl-L gRNA: 5’-CTATCCCAA TTGTGCTCCTGG-3’; Capsl-R gRNA: 5’-TGGGACTCATGGTTCTAGAGG-3’. Pdgfb-iCreER[43] transgenic mice on a mixed background of C57BL/6 and CBA was obtainted from Dr. Marcus Fruttiger and backcrossed to background for 6 generations. Capslloxp/+ mice were bred with Pdgfb-iCreER[43] transgenic mice to generate Capslloxp/loxp, Pdgfb-iCreER mice.” Sanger sequencing was performed on experimental mice to identify whether they harbor confounding mutations such as Pde6b or Crb1. The results showed the mice did not harbor confounding mutations (Fig.S9) and corresponding description was added in the manuscript at page 15.

      (6) In my opinion, more experimental detail is needed regarding Figures 2 and 3. How many fields, of how many retinas and mice were analyzed in Figure 2? How many mice were assessed in Figure 3? 

      We thank the reviewer for thoughtful comments. We have already presented the detailed information in the manuscript, please refer to the “Methods-Quantification of retinal parameters” section for experimental details.

      (7) I suggest adding into the methods whether P-values were corrected for multiple tests. 

      We thank the reviewer for suggestion. Actually, the statistical analysis was performed using unpaired Student’s t-test for comparison between two groups or one-way ANOVA followed by Dunnett multiple comparison test for comparison of multiple groups. The above description was added to “Methods-Image acquisition and statistical analysis” section to make it clear.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors): 

      In summary, the following concerns should addressing reviewers' concerns as outlined below could bolster the evidence from "solid" to "convincing" and further strengthen the study's impact. 

      (1) Analysis of the phenotype in CAPSLheterozygous mice, as highlighted by all 3 reviews. 

      We thank the editor for thoughtful comments. The phenotype analysis of Capsl heterozygous mice was added to Fig.S4, with the corresponding description provided at page 6.

      (2) Analysis of Capsl KO mice to determine if the pathways identified in vitro are modified (as suggested by reviewers 1 & 2). 

      We thank the editor for suggestion. In Fig.S7, RT-qPCR was performed on lung tissues from Capsl Ctrl and KO mice to validate the expression of MYC targets in vivo. And the result indicated that the downstream targets of MYC signaling were also downregulated in vivo, consistent with the in vitro findings.

      (3) Additional description of the genetic pedigrees and variants to address the points raised by reviewer #3. 

      We thank the editor for suggestion. The father of family 3104 has also been identified as a carrier of this heterozygous variant, manifested with FEVR symptoms, according to the medical records. Nevertheless, clinical examination data are presently unavailable. We added corresponding description in the manuscript page 5.

      (4) Validation of the identified protein variants, especially L83F which appears to be expressed at a near normal level. Are these proteins mislocalized, do the variants to interfere with sites of known or predicted protein-protein interactions, could they act in a dominant-negative fashion by aggregation with co-expressed WT protein etc. Given the comparatively weak genetic data, additional validation is required to establish plausibility of CAPSL as a FEVR gene. 

      We thank the editor for suggestion. As substantial amount of p.L83F was detectable at normal molecular weight, we further investigated whether this variant affects protein localization. Fig.S1, immunocytochemistry results indicated that this variant does not affect the subcellular localization of the protein.

      (5) Improved description of experimental details and statistical analyses as outlined by reviewer #3. 

      We thank the editor for suggestion. The more detailed information about Capsl mice was added in the manuscript at page 14-15. The experimental details regarding Figure 2 and Figure 3 have already presented in the “Methods-Quantification of retina parameters” section in the manuscript at page 19-20. And the statistical analysis was performed using unpaired Student’s t-test for comparison between two groups or one-way ANOVA followed by Dunnett multiple comparison test for comparison of multiple groups. The above description was added to “Methods-Image acquisition and statistical analysis” section at page 21 to make it clear.

      Reviewer #1 (Recommendations For The Authors): 

      My reservations lie with the main assertion that CAPSL is associated with FEVR, as the genetic evidence from human studies appears relatively weak. My concerns are as follows: 

      (1) The molecular characterization of the identified mutations suggests a loss of function (LOF). Notably, in one family, both the father and son exhibit the FEVR phenotype and share the LOF mutation, suggesting a dominant mode of inheritance. However, the prevalence of the LOF allele of CAPSL in the general population is high, and its pLI score is 0, according to the GNOMAD database. This raises doubts about the LOF variant of CAPSL being causative for FEVR. 

      We thank the reviewer for recommendation. The pLI score of LOF allele of CAPSL is based of general population, among which Europeans account for ~77% and East Asians make up less than 3%. Since the FEVR families in this article all come from China, the pLI score may not be accurate. Of course, we will continue to collect FEVR pedigrees and screen for CAPSL mutations.

      (2) In the conditional knockout study, a delay in vascular development is observed in the retina up to P14. What the phenotype looks like in adult mice and whether it replicates the human FEVR phenotype? 

      We thank the reviewer for recommendation. We further assessed the phenotype when angiogenesis almost complete at P21, the resulted showed no difference in Ctrl and CapsliECKO/iECKO mice (Fig.S5). And corresponding description was added in the manuscript at page 7.

      (3) The conditional knockout mice lack both alleles of CAPSL. The phenotype resulting from the knockout of a single allele needs investigation to align with observed human phenotypes and genetic data. 

      We thank the reviewer for recommendation. The phenotype of Capsl heterozygous mice at P5 showed no overt difference in vascular progression, vessel density and branchpoints with littermate wildtype controls (Fig.S4). The lack of pronounced phenotype in FEVR heterozygous mice may be due to different sensitivity between human and mice. A similar example is LRP5 mutations associated with FEVR. Heterozygous mutations in LRP5 were reported in FEVR patients in multiple populations. However, heterozygous Lrp5 mice exhibited no visible angiogenic phenotype (PMID: 18263894).

      (4) The MYC pathway has been identified as influenced by CAPSL. Whether MYC downregulation is observed in the mouse model in vivo? 

      We thank the reviewer for recommendation. MYC expression was identified at both mRNA and protein level in Figure S8, and corresponding description was added in the manuscript at page 11.

      Reviewer #2 (Recommendations For The Authors): 

      Minor comments: 

      (1) While authors note that little is known about CAPSL protein function, more introductory detail about the protein (structure, domains intracellular localization etc) and additional discussion on potential mechanisms would aid the reader in interpreting the findings and model.

      We thank the reviewer for recommendation. The subcellular localization of the CAPSL protein is distributed in both the nucleus and cytoplasm (https://www.proteinatlas.org/). The immunochemistry analysis confirmed that CAPSL protein is expressed in both the cell nucleus and cytoplasm (Fig.S1). And corresponding description was added in the manuscript at page 5.

      (2) Pg 7 states that Capsl knockout mainly leads to "...defects in retinal vascular ECs rather than other vascular cells.". Consider rephrasing to describe "other vasculature-associated cells", as no vascular cells outside the retina were examined in the manuscript. 

      We thank the reviewer for recommendation. We rephrased the "...defects in retinal vascular ECs rather than other vascular cells." into "...defects in retinal vascular ECs rather than other vasculature-associated cells" at page 8.

      (3) The manuscript is well written but contains numerous typos. E.g. "" (Pg 14), "MCY signaling axis" (figure 6 legend), "shCAPAL" (figure 5 K). Please correct these, and search carefully for others. 

      We are sorry for the careless mistakes we made, and we have checked the manuscript and correct these mistakes.

      Reviewer #3 (Recommendations For The Authors): 

      The following are somewhat grammatical, but significant issues, that I feel should be addressed before making the pre-print final: 

      (1) Perhaps the largest issue with the manuscript to me is whether CAPSL is an interesting candidate (as stated repeatedly) or causative of FEVR. Within the scope of what is feasible, this is a challenging problem. Since the publication of the pre-print, it would be great if another group independently reported the detection of mutations specifically in FEVR patients. That lacking, meaningful additions to the manuscript that I'd recommend are the inclusion of a paragraph on caveats of the study and reporting the allele frequencies based on public databases. As the authors know the data better than anyone and will have invested thought into the implications, they are the ones best positioned to alert the field to the study's limitations - amongst them- the factors that might practically distinguish whether CAPSL is a candidate or cause.

      We thank the reviewer for recommendation. We will collect more samples from FEVR families and screen for other mutation sites within the CAPSL gene in further studies.

      (2) It is unclear why the modeling with mice did not attempt to recapitulate the observations in humans, i.e., why were heterozygotes for a ubiquitous knockout not studied? Any data with heterozygotes, or ubiquitous alleles (which would be easier to generate than the strain studied) should be shared in the manuscript. If no such data exists, this reviewer would find it a worthwhile new experiment to add, but it is appreciated that new experiments are sometimes beyond the scope of what is possible. At the least, this would be worthwhile to discuss in the requested caveats paragraph of the discussion. 

      We thank the reviewer for recommendation. We evaluated the phenotype of Capsl heterozygous mice at P5, and the results showed no overt difference in vascular progression, vessel density and branchpoints with littermate wildtype controls (Fig.S4). The lack of pronounced phenotype in FEVR heterozygous mice may be due to different sensitivity between human and mice. For example, heterozygous Lrp5 mice exhibited no visible angiogenic phenotype (PMID: 18263894). Corresponding description was added in the manuscript at page 6.

      (3) The statement in the Abstract "which provides invaluable information for genetic counseling and prenatal diagnosis of FEVR" should be toned down, better supported, or rephrased. This appears to be the 18th disease-associated gene for FEVR, with variants identified in 4 patients of the same ethnicity. In my opinion, the word "invaluable" is currently overstated. 

      We thank the reviewer for recommendation. We have changed "which provides invaluable information for genetic counseling and prenatal diagnosis of FEVR" into "which provides valuable information for genetic counseling and prenatal diagnosis of FEVR" in the abstract.

      (4) The transcriptomic and proteomic data should be deposited into a public repository and accession numbers added to the manuscript. 

      We thank the reviewer for recommendation. We have uploaded the raw data of transcriptomic and proteomic to the Genome Sequence Archive (https://ngdc.cncb.ac.cn/gsa/browse/HRA010305) and the Genome Sequence Archive (https://www.ebi.ac.uk/pride/archive), respectively.

      (5) The links to MYC are over-stated in the title "through the MYC axis", the abstract "CAPSL function causes FEVR through MYC axis", and the discussion "we demonstrated that the defects in CAPSL affect EC function by down-regulating the MYC signaling cascade". The links to MYC are entirely by association, there were no experiments testing that the transcriptomic and proteomic changes observed were determinative of the CAPSL-mediated phenotype. It seems appropriate to conjecture that these changes are important, but the above statements all need to be altered and conjectures need to be clearly identified as such. 

      We are sorry to overstate the link between CAPSL-mediated phenotype and MYC axis in the abstract and discussion sections, and we have altered the statements in these sections to make it more logical. For example, we changed “This study also reveals that compromised CAPSL function causes FEVR through MYC axis, shedding light on the potential involvement of MYC signaling in the pathogenesis of FEVR.” into “This study also reveals that compromised CAPSL function causes FEVR may through MYC axis, shedding light on the potential involvement of MYC signaling in the pathogenesis of FEVR.” in the abstract. And in the discussion we changed “…cause FEVR through inactivating MYC signaling, expanding FEVR-involved signaling pathway and providing a potential therapeutic target for the intervention of FEVR” to “…cause FEVR may through inactivating MYC signaling, expanding FEVR-involved signaling pathway and providing a potential therapeutic target for the intervention of FEVR”.

      (6) Finally, I suggest that the following grammatical issues in the pre-print be corrected before making the pre-print final: 

      We have checked the manuscript and correct these mistakes.

      (a) p2. Suggest rewriting the sentence "Nevertheless, the molecular mechanisms by which CAPSL regulates cell processes and signaling cascades have yet to be elucidated." The preceding sentences only state that CASPL is a candidate in another disease - the word "nevertheless" seems to reflect a logic that isn't described. 

      We have checked the manuscript and correct these mistakes.

      (b) p5. Please correct the grammar "We, generated an inducible" 

      We corrected this mistake.

      (c) p5. Suggest rephrasing "impairing CAPSL expression." The word "expression" is often used in reference to transcription. To avoid confusion, something such as "eliminating or reducing protein abundance" might be better. 

      We corrected this mistake.

      (d) p6. Please correct the grammar "As expected, the radial vascular growth, as well as vessel density and vascular branching, are dramatically reduced in..." - note subject-verb agreement issue 

      We corrected this mistake.

      (e) Figure 3 legend - correct "(A) Hyloaid vessels"

      We corrected this mistake.

  2. Jul 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Fita-Torró et al. study the toxic effects of the intermediary lipid degradation product trans-2-hexadecenal (t-2-hex) on yeast mitochondria and suggest a mechanism by which Hfd1 safeguards Tom40 from lipidation by t-2-hex and its consequences, such as mitochondrial protein import inhibition, cellular proteostasis deregulation, and stress-responses. 

      The authors aimed to dissect a mechanism for t-2-hex' apoptotic consequences in yeast and they suggest it is via lipidation of Tom40 but really under the tested conditions everything seems lipidated. Thus, it is unclear whether Tom40 is the crucial causal target. They also do not provide much biochemical experiments to investigate this phenomenon further functionally. Tom40 is one possible and perhaps, given the cellular consequences, a reasonable candidate but not validated beyond in vitro lipidation by exogenous t-2-hex. 

      In the revised version of our manuscript, we have now included extensive new experimentation, which shows that protein import at the TOM complex is a physiologically important target of the pro-apoptotic lipid t-2-hex and that enzymes such as the Hfd1 dehydrogenase sensitively regulate this inhibition. In vitro chemoproteomic experiments have now been performed at more physiological t-2hex concentrations of 10µM, which is lower than published data in human cell models. Consistently, several TOM and TIM subunits are enriched in these in vitro lipidation studies (new Fig. 8B). Tom40 lipidation alone is not sufficient to explain t2-hex toxicity, as a cysteine-free version of Tom40 does not confer tolerance to the apoptotic lipid (new Fig. 8D). Importantly however, the loss of function of nonessential accessory Tom subunits 70 or 20 confers t-2-hex tolerance (new Fig. 8D) indicating that pre-protein import at the TOM complex is a physiological target of t2-hex most likely dependent on lipidation of more Tom subunits than just the essential Tom40 pore. Moreover, we now show that mitochondrial protein import is inhibited by the lipid at low physiological doses of 10µM and that this inhibition is modulated by the gene dose of the t-2-hex degrading Hfd1 enzyme (new Fig. 5G).

      Strengths: 

      The effects of lipids and their metabolic intermediates on protein function are understudied thus the authors' research contributing to elucidating direct effects of a single lipid is appreciated. It is particularly unknown by which mechanism t-2hex causes cell death in yeast. The authors elegantly use modulation of the levels of enzyme Hfd1 that endogenously catabolizes t-2-hex as an approach to studying t2-hex stress. Understanding the cause and consequences of this stress is relevant for understanding fundamental regulation mechanisms, and also to human health since the human homolog of Hfd1, ALDH3A2, is mutated in Sjögren-Larsson Syndrome. The application of a variety of global transcriptomic, functional genomic, and chemoproteomic approaches to study t-2-hex stress targets in the yeast model is laudable. 

      Weaknesses: 

      -  The extent of the contribution of Tom40 lipidation to the general t-2-hex stress phenotype is unclear. Is Tom40 lipidation alone enough to cause the phenotype? An alteration of the cysteine residue in question could help answer this key question. 

      Deletion of all four cysteine residues in Tom40 is not sufficient to confer resistance to t-2-hex stress. This result had been included in the original manuscript, but was somehow hidden in the Discussion. The revised manuscript now includes t-2hex tolerance assays for the Tom40 cysteine free mutant in new Figure 8. As a result, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. This implies most likely other lipidation targets within the TOM and TIM complexes, as indicated by our in vitro lipidation studies. We therefore included the non-essential adaptor proteins Tom70 and Tom20 of the TOM complex and tested the tolerance of the respective deletion mutants in t-2-hex tolerance assays. As shown in new Figure 8, the absence of Tom70 and Tom20 function significantly increases tolerance to t-2hex and the tom20 mutant accumulates less Aim17 pre-protein upon t-2-he stress, indicating that the TOM complex is a physiologically important target of the proapoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      -  It is unclear whether the exogenously applied amounts of t-2-hex (concentrations chosen between 25-200 uM) are physiologically relevant in yeast cells. For comparison, Chipuk et al. (2012) used at most 1 uM on mitochondria of human cells, while Jarugumilli et al. (2018) considered 25 uM a 'lower dose' on human cells. Since the authors saw responses below 10 uM (Fig. 3B) and at the lowest selected concentration of 25 uM (Fig. 8), why were no lower, likely more specific, concentrations applied for the global transcriptomic and chemoproteomic experiments? Key experiments have to be repeated with the lower concentrations. 

      We have now performed several experiments with lower t-2-hex concentrations. A new chemoproteomic study with 10µM t-2-hex-alkyne has been conducted and the new results added to the supplementary information, combining 10µM and 100µM in vitro lipidation studies (Suppl. Table 6). Many subunits of the TOM and TIM complexes consistently are enriched significantly in both chemoproteomic experiments. These new data are summarized in revised Figure 8. Additionally we have performed in vivo pre-protein assays with lower t-2-hex concentrations. As shown in new Figure 5, Aim17 mitochondrial import is already inhibited by t-2-hex doses as low as 10µM in a wild type strain, and that this inhibition is enhanced in a hfd1 mutant and alleviated in a Hfd1 overexpressor. It is important to note that a dose of 10µM of external t-2-hex addition is significantly lower than doses applied to human cell cultures such as in Jarugumilli et al. (2018). It proves that mitochondrial protein import is a sensitive and physiologically relevant t2-hex target in our yeast models and that t-2-hex detoxification by enzymes such as the Hfd1 dehydrogenase sensitively regulates this specific inhibition.

      -  The amount of t-2-hex applied is especially important to consider in light of over 1300 proteins lipidated to an extent equal to or greater than Tom40 (Supp. Table 6). This chemoproteomic experiment (Fig. 8B, Supp. Table 6) is also weakened by the inclusion of only 2 replicates, thus precluding assessment of statistical significance. The selection of targets in Fig. 8B as "among the best hits" is neither immediately comprehensible nor further explained and represents at best cherrypicking. Further evidence based on statistical significance or validation by other means should be provided.

      We performed the chemoproteomic screens as described by Jarugumilli et al. (2018) with 2 replicates of mock treated versus 2 replicates of t-2-hex-alkyne treated cell extracts.  A new chemoproteomic study with 10µM t-2-hex-alkyne has been conducted and the new results added to the supplementary information combining 10µM and 100µM in vitro lipidation studies (Suppl. Table 6). Differential enrichment analysis of the proteomic data was performed with the amica software (Didusch et al., 2022). Proteins were ranked according to their log2 fold induction comparing lipid- and mock-treated samples with a threshold of ≥1.5, and the adjusted p-value was calculated. Several TOM and TIM subunits were consistently identified as differentially enriched proteins, which is summarized in new Figure 8B.

      - The authors unfortunately also underuse the possible contribution of mass spectrometry technology to in addition determine the extent and localization of lipidation on a global scale (especially relevant since Cohen et al. (2020) suggest site-specific mechanisms). 

      We agree that site-specific modifications of t-2-hex will be most likely important in the inhibition or other type of regulation of specific target proteins. Our collective data show that in the case of the inhibition of mitochondrial protein import, several lipidation events on TOM and TIM are involved. Dissection of individual cysteine lipidations on those subunits will be interesting, but we feel that this is out of the scope of the present work.

      - The general novelty of studying t-2-hex stress is lowered in light of existing literature in humans (see e. g. Chipuk et al., 2012; Cohen et al., 2020; Jarugumilli et al., 2018), and in yeast by the same authors (Manzanares-Estreder et al., 2017) and as the authors comment themselves, a significant part of the manuscript may represent rather a confirmation of the already described consequences of t-2-hex stress 

      We do not agree and we have not commented that our present study is a mere confirmation of t-2-hex stress previously applied in yeast and human models. In humans, t-2-hex has been identified as an efficient pro-apoptotic lipid, which causes mitochondrial dysfunction via direct lipidation of Bax, however the studies of Jarugumilli et al. (2018) revealed that many other direct t-2-hex targets exist, which remained uninvestigated to date. This work continues our previous studies (Manzanares-Estreder et al., 2017), where we show that t-2-hex is a universal proapoptotic lipid applicable in yeast models and contributes important novel findings, such as the massive transcriptional response resembling proteostatic defects caused by t-2-hex, mitochondrial protein import as a physiologically important and direct target of t-2-hex, the function of detoxifying enzymes such as Hfd1 in modulating lipid-mediated inhibition of mitochondrial protein import and general proteostasis. Additionally, we provide transcriptomic, chemoproteomic and functional genomic data to the scientific community, which will be a rich source for future studies on yet undiscovered pro-apoptotic mechanisms employed by t-2-hex. 

      Reviewer #2 (Public Review): 

      This study elucidates the toxic effects of the lipid aldehyde trans-2-hexadecenal (t-2-hex). The authors show convincingly that t-2-hex induces a strong transcriptional response, leads to proteotoxic stress, and causes the accumulation of mitochondrial precursor proteins in the cytosol. 

      The data shown are of high quality and well controlled. The genetic screen for mutants that are hyper-and hypo-sensitive to t-2-hex is elegant and interesting, even if the mechanistic insights from the screen are rather limited. The last part of the study is less convincing. The authors show evidence that t-2-hex affects subunits of the TOM complex. However, they do not formally demonstrate that the lipidation of a TOM subunit is responsible for the toxic effect of t-2-hex. A t-2-hexresistant TOM mutant was not identified. Moreover, it is not clear whether the concentrations of t-2-hex in this study are physiological. This is, however, a critical aspect. The literature is full of studies claiming the toxic effects of compounds such as H2O2; even if such studies are technically sound, they are misleading if nonphysiological concentrations of such compounds were used. 

      Nevertheless, this is an interesting study of high quality. A few specific aspects should be addressed.

      We have now performed t-2-hex toxicity assays using several mutants in Tom subunits, the cysteine free mutant of the essential Tom40 core channel and deletion mutants in the accessory subunits Tom70 and Tom20 (new Figure 8). As a result, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. This implies most likely other lipidation targets within the TOM and TIM complexes, as indicated by our in vitro lipidation studies. Indeed, as shown in new Figure 8, the absence of Tom70 and Tom20 function significantly increases tolerance to t-2-hex indicating that the TOM complex is a physiologically important target of the proapoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      We have now performed several experiments with lower t-2-hex concentrations. A new chemoproteomic study with 10µM t-2-hex-alkyne has been conducted and the new results added to the supplementary information combining 10µM and 100µM in vitro lipidation studies (Suppl. Table 6). Many subunits of the TOM and TIM complexes consistently are enriched significantly in both chemoproteomic experiments. These new data are summarized in revised Figure 8.

      Additionally we have performed in vivo pre-protein assays with lower t-2-hex concentrations. As shown in new Figure 5, Aim17 mitochondrial import is already inhibited by t-2-hex doses as low as 10µM in a wild type strain, and that this inhibition is enhanced in a hfd1 mutant and alleviated in a Hfd1 overexpressor. It is important to note that a dose of 10µM of external t-2-hex addition is significantly lower than doses applied to human cell cultures such as in Jarugumilli et al. (2018). It proves that mitochondrial protein import is a sensitive and physiologically relevant t2-hex target in our yeast models and that t-2-hex detoxification by enzymes such as the Hfd1 dehydrogenase sensitively regulates this specific inhibition.

      Reviewer #3 (Public Review): 

      Summary: The authors investigate the effect of the lipid aldehyde trans-2hexadecenal (t-2-hex) in yeast using multiple omic analyses that show that a large range of cellular functions across all compartments are affected, e.g. transcriptomic changes affect 1/3 of all genes. The authors provide additional analyses, from which they built a model that mitochondrial protein import caused by modification of Tom40 is blocked. 

      Strengths: Global analyses (transcriptomic and functional genomics approach) to obtain an unbiased overview of changes upon t-2-hex treatment. 

      Weaknesses: It is not clear why the authors decided to focus on mitochondria, as only 30 genes assigned to the GO term "mitochondria" are increasing, and also the follow-up analyses using SATAY is not showing a predominance for mitochondrial proteins (only 4 genes are identified as hits). The provided additional experimental data do not support the main claims as neither protein import is investigated nor is there experimental evidence that lipidation of Tom40 occurs in vivo and impacts on protein translocation. 

      30 mitochondrial gene functions are very strongly (>10 fold) up-regulated by t-2-hex. However, when genes up-regulated (>2 log2FC) or down-regulated (<-2 log2FC) by t-2-hex were selected and subjected to GO category enrichment analysis, we found that “Mitochondrial organization” was the most numerous GO group activated by t-2-hex, while it was “Ribosomal subunit biogenesis” for t-2-hex repression (new data in Suppl. Tables 1 and 2). 

      In the revised version of our manuscript, we have now included extensive new experimentation, which shows that protein import at the TOM complex is a physiologically important target of the pro-apoptotic lipid t-2-hex and that enzymes such as the Hfd1 dehydrogenase sensitively regulate this inhibition. In vitro chemoproteomic experiments have now been performed at more physiological t-2hex concentrations of 10µM, which is lower than published data in human cell models. Consistently, several TOM and TIM subunits are enriched in these in vitro lipidation studies (new Fig. 8B). Tom40 lipidation alone is not sufficient to explain t2-hex toxicity, as a cysteine-free version of Tom40 does not confer tolerance to the apoptotic lipid (new Fig. 8D). Importantly however, the loss of function of nonessential accessory Tom subunits 70 or 20 confers t-2-hex tolerance (new Fig. 8D) indicating that pre-protein import at the TOM complex is a physiological target of t2-hex most likely dependent on lipidation of more Tom subunits than just the essential Tom40 pore. Moreover, we now show that mitochondrial protein import is inhibited by the lipid at low physiological doses of 10µM and that this inhibition is modulated by the gene dose of the t-2-hex degrading Hfd1 enzyme (new Fig. 5G).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Private recommendations for the authors 

      - On the existing data from Supp. Table 6, the authors may include a global assessment of whether or not the protein included a cysteine (the likely site for lipidation). 

      Although free cysteines in target proteins are the most frequent sites of modification by LDEs such as t-2-hex, other amino acids such as lysines or histidines can be lipidated by these lipid derivatives. Therefore we would like to exclude this information from our chemoproteomic data.

      - What determines whether a gene is labeled in Fig. 6B other than fold change? Why is MAC1 with the highest FC not shown? 

      We analyzed the potential anti-apoptotic SATAY hits with a log2 < -0.75 according to expected detoxification pathways (heat shock response, pleiotropic drug response), to their function in the ER (the intracellular site where t-2-hex is generated) or in mitochondria (the major t-2-hex target identified so far). This is now better described in the text. As for the potential pro-apoptotic SATAY hits, we analyzed gene functions with a log2 > 1.5 and marked the predominant groups “Cytosolic ribosome and translation” and “Amino acid metabolism”. In any case, the interested reader has all SATAY data available in supplemental tables 4 and 5 to find alternative gene functions with a potential role in cellular adaptation to t-2-hex.

      - Supplementary Table numbering should be double-checked.

      Ok, numbering has been double-checked.

      Reviewer #2 (Recommendations For The Authors): 

      Major points 

      (1) Identification of the t-2-hex target. Neither Tom70, Tom20 nor the cysteine in Tom40 is essential. If one of these components is critical for the t-2-hex-mediated toxicity, mutants should be t-2-hex-resistant. This is a straight-forward, simple, and critical experiment. 

      We have now performed t-2-hex toxicity assays in the cysteine free Tom40 mutant, and tom20 and tom70 deletion mutants. As shown in new Figure 8, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. However, the absence of Tom70 and Tom20 function significantly increases tolerance to t-2-hex indicating that the TOM complex is a physiologically important target of the proapoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      (2) The authors claim that t-2-hex blocks the TOM complex. Since in vitro import assays with yeast mitochondria are a well established and simple technique, the authors should isolate mitochondria from their cells and perform import experiments. It is expected that those mitochondria show reduced import rates, however, swelling of these mitochondria to mitoplasts should suppress the import defect. 

      We agree that our study does not investigate a direct effect of t-2-hex on the import capacity of purified mitochondria. However, we determine the in vivo accumulation of several mitochondrial precursor proteins, which is widely used to assay for the efficiency of mitochondrial protein import, for example the recent hallmark paper discovering the mitoCPR protein import surveillance pathway exclusively uses epitope-tagged mitochondrial precursors to determine the regulation of mitochondrial protein import (Weidberg and Amon, Science 2018 360(6385)). Additionally, our new results that mutants in accessory TOM subunits 20 and 70 are hyperresistant to t-2-hex (Figure 8D) and that deletion of TOM20 decreases the t-2-hex induced pre-protein accumulation (Suppl. Figure 1) identify the TOM complex and hence protein import at the outer mitochondrial membrane as a physiologically important t-2-hex target.

      (3) The first part of the study is very strong. The last figure is also of good quality, however, it is not clear whether the effects on TOM subunits are really causal for the observed t-2-hex effect on gene expression. The authors might cure this by improved data or by avoiding bold statements such as: 'Hfd1 associates with the Tom70 subunit of the TOM complex and t-2-hex covalently lipidates the central Tom40 channel, which altogether indicates that transport of mitochondrial precursor proteins through the outer mitochondrial membrane is directly inhibited by the pro-apoptotic lipid and thus represents a hotspot for pro- and anti-apoptotic signaling.' (Abstract). 

      We now show that several TOM and TIM subunits are lipidated in vitro by physiological low t-2-hex concentrations, that loss of function of accessory subunits Tom20 or Tom70 rescues t-2-hex toxicity (new Figure 8) and that the gene dose of Hfd1 determines the degree of mitoprotein import block (new Figure 5). These data identify the TOM complex as a physiologically important target of the pro-apoptotic lipid. The Abstract has been modified accordingly.

      (4) If the t-2-hex levels are in a physiological range, one would expect that overexpression of Hfd1 prevents the t-2-hex-induced import arrest.

      We have now confirmed that overexpression of Hfd1 indeed prevents inhibition of mitochondrial protein import by t-2-hex. As shown in new Figure 5, Aim17 mitochondrial import is already inhibited by t-2-hex doses as low as 10µM in a wild type strain, and that this inhibition is enhanced in a hfd1 mutant and alleviated in a Hfd1 overexpressor.

      (5) The authors claim that Fmp52 is a t-2-hex-detoxifying enzyme, but do not show evidence. They should rewrite this sentence and be more cautious, or they should show that increased Fmp52 levels indeed deplete t-2-hex from mitochondria.  

      We show that loss of Fmp52 function leads to a strong t-2-hex sensitivity. Fmp52 belongs to the NAD-binding short-chain dehydrogenase/reductase (SDR) family and localizes to highly purified mitochondrial outer membranes (Zahedi et al, 2006). These are the indications that suggest that Fmp52 participates in the enzymatic detoxification of t-2-hex in addition to Hfd1. The Results section has been modified accordingly.

      Minor points: 

      (6) Aim17 was recently identified as a characteristic constituent of cytosolic protein aggregates named MitoStores (Krämer et al., 2023, EMBO J). The authors might test whether the cytosolic Aim17 protein colocalizes with the Hsp104-GFP granules that accumulate upon t-2-hex exposure as shown in Fig. 4A. 

      We agree that determining the fate of unimported mitochondrial precursors upon t-2-hex stress would be interesting. We have made some attempts to co-visualize Aim17-dsRed and Hsp104-GFP upon t-2-hex treatment, but we still have some technical issues. While we clearly see that Aim17 accumulates in cytoplasmic foci upon prolonged t-2-hex exposure, we are not able to determine colocalization with Hsp104, in great part because t-2-hex causes mitochondrial fragmentation, which leads to the appearance of Aim17-stained foci in the cytosol independently of protein aggregates. While so far we are not able to localize Aim17 unambiguously in Hsp104 containing aggregates (mitoStores) upon lipid stress, we would like to move the manuscript farther without those experiments.

      (7) In Fig. 1A, the figures of the different lines are difficult to distinguish. Lines of one color with different intensities would be better suited. 

      We have been working before with dose-response profiles generated by the destabilized luciferase system and found that the color-coded representation of the plots is the most effective way to represent the data, see for example Fita-Torró et al. Mol Ecol. 2023 32(13):3557-3574, Pascual-Ahuir et al. BBA 2019 1862(4):457-471, Rienzo et al., Mol Cell Biol. 2015 35(21):3669-83, and several other publications. Therefore we want to keep the format of the Figure.

      (8) A title page should be added to each of the supplemental data files with short descriptions of the information that is provided in the columns of the tables.  Response: Explanatory title pages have been now added to the supplemental data files.

      Reviewer #3 (Recommendations For The Authors): 

      Figure 5A: The authors aim to assess protein import, however, their experimental set-up is not suited and does not allow conclusions about protein translocation into mitochondria. The authors monitor protein steady state levels, which does not reflect import capacity. For this e.g. pulse-chase experiments coupled to coIP or in organello import assays with radiolabeled substrate proteins would be required. In addition, the authors lack a non-treated control to show that no precursor accumulates in the absence of CCCP and t-2-hex. At the moment, the conclusion of blocked import cannot be made, as there are many other explanations for the observed steady state levels, e.g. the TAP tag interfered with the import competence of the precursor or t-2-hex could impact on MPP function (in particular as Figure 8B shows that also intra-mitochondrial proteins undergo modification by t-2-hex). 

      We agree that our study does not investigate a direct effect of t-2-hex on the import capacity of purified mitochondria. However, we determine the in vivo accumulation of several mitochondrial precursor proteins, which is widely used to assay for the efficiency of mitochondrial protein import, for example the recent hallmark paper discovering the mitoCPR protein import surveillance pathway exclusively uses epitope-tagged mitochondrial precursors to determine the regulation of mitochondrial protein import (Weidberg and Amon, Science 2018 360(6385)). Figure 5 contains several non-treated control experiments, which show that no (or less in the case of Ilv6) precursors of Tap-tagged Aim17, Cox5a, Ilv6, or Sdh4 accumulate in the absence of CCCP or t-2-hex. This is shown in Figure 5A for untreated cells or in Figure 5B and new Figure 5G for solvent (DMSO) treated cells. This demonstrates that the Tap-tag does not interfere with the import of the respective precursors. Additionally, our new results that mutants in accessory TOM subunits 20 and 70 are hyperresistant to t-2-hex (Figure 8D) identify the TOM complex and hence protein import at the outer mitochondrial membrane as a physiologically important t-2-hex target.

      Figure 8: The conclusion that Tom40 is directly lipidated comes from an in vitro assay, with the conclusion that Tom40 is the main target, because it is the only Tom protein with a cysteine (Tom70 as not being part of the Tom core is excluded, however, lack of Tom70 function would also have detrimental consequences for mitochondrial protein import). However, there is no experiment showing a modification of Tom40 and a consequence for protein import. The proposed model is therefore very far-fetched and several aspects are speculation but not supported by experimental data. To propose such a model, the author needs to show experimental evidence, e.g. by generating a yeast strain in which the cysteine i Tom40 are replaced by e.g. Serine residues, and then assess if protein import (e.g. pulse-chase assays) are not affected anymore upon addition of t-2-hex. 

      Deletion of all four cysteine residues in Tom40 is not sufficient to confer resistance to t-2-hex stress. This result had been included in the original manuscript, but was somehow hidden in the Discussion. The revised manuscript now includes t-2hex tolerance assays for the Tom40 cysteine free mutant in new Figure 8D. As a result, cysteine lipidation of Tom40 alone is not sufficient to confer t-2-hex toxicity. This implies most likely other lipidation targets within the TOM and TIM complexes, as indicated by our in vitro lipidation studies. We therefore included the non-essential adaptor proteins Tom70 and Tom20 of the TOM complex and tested the tolerance of the respective deletion mutants in t-2-hex tolerance assays. As shown in new Figure 8D, the absence of Tom70 and Tom20 function significantly increases tolerance to t2-hex indicating that the TOM complex is a physiologically important target of the pro-apoptotic lipid, which acts most likely via lipidation of more subunits than the Tom40 import channel.

      Figure 8A: The pulldown experiments lack positive (other Tom subunits) and negative controls and were performed with (large) tags on all proteins, which can easily result in false positive interactions. The conclusion that Hfd1 interacts with Tom70 and Tom22 cannot be made. Also, the conclusion if an interaction is robust or not cannot be made as the pull-down lacks control fractions, it is also not clear how much of the eluate was loaded. Finally, Hfd1-HA was not expressed from its endogenous promoter, likely resulting in over-expression, which again strongly hampers conclusions about bona fide interaction partners. 

      We agree that our pulldown studies are done in an artificial context, such as Hfd1 overexpression needed for sufficient protein level for detection or use of Tapfusion proteins. However, the conclusion that Tom70 is a potential interactor of Hfd1 can be made based on the following observations: Hfd1-HA is preferentially pulled down from total protein extracts containing Tom70-Tap, but not from extracts containing no Tap-protein and significantly less from extracts containing Tom22-Tap, another TOM associated subunit. The pulldown assay has been repeated now several times and the efficiency of Hfd1 pulldown has been quantified and statistically analyzed with respect to the quantity of purified Tom protein, which is shown in modified Figure 8A. 

      Figure 4A and C: Depletion of proteasomal activity results in larger aggregates in Figure 4A. However, the addition of t-2-hex blocks proteasomal activity (Figure 4C). How can proteasome inhibition result in bigger aggregates if the proteasomal activity is lost upon t-2-hex addition?

      The negative effect of t-2-hex on proteasomal activity is most likely an indirect effect caused by protein aggregation (Bence et al., Science 2001 292-1552) and occurs in wild type and rpn4 mutant cells with reduced proteasomal activity (Fig. 4C). t-2-hex causes cytosolic protein aggregation in wild type cells, which is aggravated (more and larger protein aggregates) in rpn4 mutants because of their lower levels of active proteasome (Fig. 4A). The observed protein aggregates will further diminish proteasomal activity, which is confirmed in Fig. 4C. 

      Figure 1B: The authors use a reporter to determine HFD1 expression that consists of the promoter region of HFD1 fused to luciferase. These fusion constructs have been shown to often not reflect the bona fide expression levels of genes (Yoneda et al., J Cell Sci 2004). qPCR analysis of transcript levels should be included to support the induction of HFD1. 

      We agree that the live cell luciferase reporters used here are not suitable for the determination of absolute mRNA levels. However, the aim of these reporter experiments is to quantify the inducibility of different genes (HFD1, GRE2) dependent on increasing stress doses. These dose response profiles cannot be obtained by qPCR analysis, while the destabilized reporters are an excellent tool for this, which have been used to accurately describe numerous dynamic stress responses (for example: Dolz-Edo et al. 2013 MCB 33:2228-40, Rienzo et al. 2015 MCB 35:3669-83, PascualAhuir et al. 2019 BBA 862:457-471). Additionally, the induction of HFD1 mRNA levels by salt (NaCl) and oxidative (menadione) stress determined by qPCR has been published before (Manzanares-Estreder et al. 2017 Oxid Med Cell Longevity 2017:2708345).

      The authors conclude from Figure 1 that entry into apoptotic cell death is modulated by efficient t-2-hex detoxification. However, this is based on growth curves and no analysis of apoptotic cell death is performed. The data show that the addition of hexadecenal results in a growth arrest, that is overcome likely upon degradation of t-2-hex (depending on the amount of Hfd1). 

      We agree that our experiments measure growth inhibition and not specifically apoptotic cell death. The text has been changed accordingly.  

      Figure 4A: Microscopy images show between 1-2 yeast cells. Either more cells need to be shown or quantifications of the aggregates are required. In addition, it is not clear if the control received the same DMSO concentration as the treated cells and also the time point for the control is not specified. 

      We have now quantified the number of aggregates across cell populations in new Figure 4A in DMSO, t-2-hex and t-2-hex-H2 treated wt and rpn4 mutants. These data show specific aggregate induction by t-2-hex and not by DMSO or the saturated t-2-hex-H2 control, which is aggravated in rpn4 mutants and avoided by CHX pretreatment.

      Figure 5: Western blots in figure 5A, B, D, E and F lack a loading control. Without this, conclusions about increases in protein abundance cannot be made.  Response: We have now included additional panels with the loading controls for the Western blots in new figure 5, except figure 5B, where the appearance or not of the pre-protein can be compared to the amount of mature protein in the same blot.

      Figure 2B: Complex II assembly factors SDH5,6,9 are described here as ETC complexes. As the proteins are not part of the mature complex II, the heading should be modified into ETC complexes and ETC assembly.

      Figure 2B has been revised and the classification of ETC proteins changed accordingly.

    1. Author response

      Reviewer #1 (Public Review):

      The authors use neural recordings from three different brain areas to assess whether the type of evidence accumulation dynamics in those regions are (1) similar to one another, and (2) similar to best-fitting evidence accumulation dynamics to behavioral choice alone. This is an important theoretical question because it relates to the 'linking hypothesis' that relates neurophysiological data to psychological phenomena. Although the standard evidence accumulation dynamic in describing choice has been the gradual accumulation of evidence, the authors find that those dynamics are not represented equally in all brain regions. Such results suggest that more nuanced computational models are needed to explain how brain areas interact to produce decisions, and the focus of theoretical development should shift away from explaining behavioral patterns alone and more toward explaining both brain and behavioral interactions. Given that the authors simply test the assumption that the same dynamics that best explain behavior should also explain neural data, they accomplish their objective using a sophisticated methodology and find evidence *against* this assumption: they find that each region was best described by a distinct accumulation model, which all differed from the model that best described the rat's choices.

      I thought this was an excellent paper with a clear scientific objective, direct analysis to achieve that objective, and a very strong methodological approach to leave little doubt that the conclusions they drew from their analyses were as reasonable and accurate as possible.

      We thank the reviewer for their time and appreciate their generous comments.

      Reviewer #2 (Public Review):

      The neural dynamics underlying decision-making have long been studied across different species (e.g., primates and rodents) and brain areas (e.g., parietal cortex, frontal eye fields, striatum). The key question is to what extent neural firing rates covary with evidence accumulation processes as proposed by evidence accumulation models. It is often assumed that the evidence-accumulation process at the neural level should mirror the evidence-accumulation process at the behavioral level. The current paper shows that the neural dynamics of three rat brain regions (the FOF, ADS, and PCC) all show signatures of evidence accumulation, but in distinct ways. Especially the role of the FOF appears to be distinct, due to its dependence on early evidence compared to the other regions. This sheds new light and a new interpretation of the role of the FOF in decision-making - previously, it has been described as a region encoding the choice that is currently being committed to; this new analysis suggests it is instead strongly influenced by early evidence.

      A major strength of the paper is that the results are achieved through joint modelling of the behavioral and neural data, combined with information on the physical stimulus at hand. Joint models were shown to provide more information on the underlying processes compared to behavioral or neural models alone. Especially the inclusion of the neural data seemed to have greatly improved the quality of inferences. This is a key contribution that illustrates that the sophisticated modelling of multiple sources of data at the same time, pays off in terms of the quality of inferences. Yet, it should be added here, that due to the nature of the task, the behavioral data contained only choices, and not response times, which tend to contain more information regarding the evidence accumulation process than choice alone. It would be interesting to additionally discuss how choice decision times can be modeled with the proposed modelling framework.

      We thank the reviewer for their generous views on our work. We agree that adding decision times, which could readily be added to our framework, will likely further constrain the inference of the latent model. We are currently pursuing such topics using this framework and appropriate data. We have altered a passage in our Discussion, where we note the various extensions of our model one could pursue, to include response time within the set of behavioral measurements one might include.

      A main limitation of the paper is that it does not appear to address a seemingly logical follow-up question: If these three brain regions individually accumulate evidence in distinct manners, how do these multiple brain regions then each contribute to a final choice? The joint models fit each region's data separately, so how well does each region individually 'explain' or 'predict' behavior, and how does the combined neural activity of the regions lead to manifest behavior? I would be very interested in the authors' perspectives on these questions.

      We could not share the reviewers view and interest in this question with any more excitement than we already do! Unfortunately, the experiments necessary for answering this question in a satisfying way have not yet been performed (e.g. simultaneous multi-region population recordings). Additionally, our analysis approach, as presented currently, would require some technical alterations to deal with data at that scale. Both efforts are underway, but we feel as though the current manuscript describes the basic modeling framework one would need to use to address these questions if/when such data exists. We have added some text to the Discussion to highlight these exciting future directions:

      “An exciting future application of our modeling framework is to model multiple, independent accumulators in several brain regions which collectively give rise to the animal’s behavior. Such a model would provide incredible insight into how the brain collectively gives rise to behavioral choices.”

      There are some remaining questions regarding the specific models used, that I was hoping the authors could clarify. Specifically, in equations 10-11, I was wondering to what extent there might be a collinearity issue. Equation 10 proposes that the firing rates of neurons can vary across time due to two mechanisms: (1) The dependence of the firing rate on the accumulated evidence, and (2) a time-varying trial average (as detailed in Equation 11). If firing rates of the neuron indeed covary with the accumulated evidence and therefore increase across time, how can the effects of mechanisms 1 and 2 be disentangled? Relatedly, the independent noise models model each neuron separately and thereby include many more parameters, each informed by less data. Is it possible that the relatively poor cross-validation of the independent noise model may be a consequence of the overfitting of the independent noise model?

      Thank you for this important observation. Please see our response to the essential revisions above which addresses this issue. In short, although it is true that firing rates increase with time (with accumulating evidence) they do so in a way that depends on the stimulus, and so just as often as they increase with time, they decrease.

      Regarding the poor cross-validation of the independent noise model, we apologize for confusion here — both the shared and independent noise model have exactly the same number of parameters. They only differ in that the latent process for a trial contains unique noise instantiation per trial for the independent noise model and the same instantiating for the shared model. The number of parameters is the same. See above for our response to this issue, and how the manuscript was modified in light of this confusion.

      Another related question is how robust the parameter recovery properties of these models are under a wider range of data-generating parameter settings. I greatly appreciate the inclusion of a parameter recovery study (Figure S1C) using a single synthetic dataset, but it could be made even stronger by simulating multiple datasets with a wider range of parameter settings. Such a simulation study would help understand how robust and reliable the estimated parameters of all models are. Similarly, it would be helpful if also the \theta_{y} parameters are shown, which aren't shown in Figure S1C.

      We agree that understanding the model fitting behavior under a wider set of parameter settings is valuable. We fit our model to additional sets of parameter settings and included an additional supplemental figure (Figure 1 — figure supplement 2) to illustrate these results. In short, we found that parameter recovery was robust across the different parameter settings we tested. We also updated Figure S1C with the neural parameters. We included the following in the Results to note that parameter recovery was robust:

      “We verified that our method was able to recover the parameters that generated synthetic physiologically-relevant spiking and choices data (Figure 1 — figure supplement 1), and that parameter recovery was robust across a range of parameter values (Figure 1 — figure supplement 2)).” 

      An aspect of the paper that initially raised confusion with me is that the models fit on the choice data and stimulus information alone, make different predictions for the evidence accumulation dynamics in different regions (e.g., Figure 5A, 6A) and also led to different best-fitting parameters in different regions (Figure S9A). It took me a while to realize that this is due to the data being pooled across different rats and sessions - as such, the behavioral choice data are not the same across regions, and neither is the resulting fit models. This could easily be clarified by adding a few notes in the captions of the relevant figures.

      Thanks for pointing this out. We agree that this tends to be a point of confusion, and we have added clarification prior to Fig 3, where the choice model is first introduced:

      “We stress that because of this, each fitted choice model uses different behavioral choice data, and thus the fitted parameters vary from fitted model to fitted model.”

      Combined, this manuscript represents an interesting and welcome contribution to an ongoing debate on the neural dynamics of decision-making across different brain regions. It also introduced new joint modelling techniques that can be used in the field and raised new questions on how the concurrent activity of neurons across different brain regions combined leads to behavior.

      We appreciate the very generous views on our work!

    1. Author response:

      eLife assessment

      This useful study reports on the discovery of an antimicrobial agent that kills Neisseria gonorrhoeae. Sensitivity is attributed to a combination of DedA assisted uptake of oxydifficidin into the cytoplasm and the presence of a oxydifficidin-sensitive RplL ribosomal protein. Due to the narrow scope, the broader antibacterial spectrum remains unclear and therefore the evidence supporting the conclusions is incomplete with key methods and data lacking. This work will be of interest to microbiologists and synthetic biologists.

      General comment about narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The main focus of this study is on its previously unreported potent anti-gonococcal activity and mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      We are troubled by the statement that our paper is narrow in scope and that evidence supporting our conclusions is incomplete. We do not feel the reviews as presented substantiate drawing this conclusion about our work.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kan et al. report the serendipitous discovery of a Bacillus amyloliquefaciens strain that kills N. gonorrhoeae. They use TnSeq to identify that the anti-gonococcal agent is oxydifficidin and show that it acts at the ribosome and that one of the dedA gene products in N. gonorrhoeae MS11 is important for moving the oxydifficidin across the membrane.

      Strengths:

      This is an impressive amount of work, moving from a serendipitous observation through TnSeq to characterize the mechanism by which Oxydifficidin works.

      Weaknesses:

      (1) There are important gaps in the manuscript's methods.

      The requested additions to the method describing bacterial sequencing and anti-gonococcal activity screening will be made. However, we do not think the absence of these generic methods reduces the significance of our findings.

      (2) The work should evaluate antibiotics relevant to N. gonorrhoeae.

      (1) It is not clear to us why reevaluating the activity of well characterized antibiotics against known gonorrhoeae clinical strains would add value to this manuscript. The activity of clinically relevant antibiotics against antibiotic-resistant N. gonorrhoeae clinical isolates is well described in the literature. Our use of antibiotics in this study was intended to aid in the identification of oxydifficidin’s mode of action. This is true for both Tables 1 and 2.

      (2) If the reviewer insists, we would be happy to include MIC data for the following clinically relevant antibiotics: ceftriaxone (cephalosporin/beta-lactam), gentamicin (aminoglycoside), azithromycin (macrolide), and ciprofloxacin (fluoroquinolone).

      (3) The genetic diversity of dedA and rplL in N. gonorrhoeae is not clear, neither is it clear whether oxydifficidin is active against more relevant strains and species than tested so far.

      (1) We thank the reviewer for this suggestion. We aligned the DedA sequence from strain MS11 with DedA proteins from 220 N. gonorrhoeae strains that have high-quality assemblies in NCBI. The result showed that there are no amino acid changes in this protein. Using the same method, we observed several single amino acid changes in RplL. This included changes at A64, G25 and S82 in 4 strains with one change per strain. These sites differ from R76 and K84, where we identified changes that provide resistance to oxydifficidin. Notably, in a similar search of representative Escherichia, Chlamydia, Vibrio, and Pseudomonas NCBI deposited genomes, we did not identify changes in RplL at position R76 or K84.

      (2) While the usefulness of screening more clinically relevant antibiotics against clinical isolates as suggested in comment 2 was not clear to us, we agree that screening these strains for oxydifficidin activity would be beneficial. We have ordered Neisseria gonorrhoeae strain AR1280, AR1281 (CDC), and Neisseria meningitidis ATCC 13090. They will be tested when they arrive.

      Reviewer #2 (Public Review):

      Summary:

      Kan et al. present the discovery of oxydifficidin as a potential antimicrobial against N. gonorrhoeae, including multi-drug resistant strains. The authors show the role of DedA flippase-assisted uptake and the specificity of RplL in the mechanism of action for oxydifficidin. This novel mode of action could potentially offer a new therapeutic avenue, providing a critical addition to the limited arsenal of antibiotics effective against gonorrhea.

      Strengths:

      This study underscores the potential of revisiting natural products for antibiotic discovery of modern-day-concerning pathogens and highlights a new target mechanism that could inform future drug development. Indeed there is a recent growing body of research utilizing AI and predictive computational informatics to revisit potential antimicrobial agents and metabolites from cultured bacterial species. The discovery of oxydifficidin interaction with RplL and its DedA-assisted uptake mechanism opens new research directions in understanding and combating antibiotic-resistant N. gonorrhoeae. Methodologically, the study is rigorous employing various experimental techniques such as genome sequencing, bioassay-guided fractionation, LCMS, NMR, and Tn-mutagenesis.

      Weaknesses:

      The scope is somewhat narrow, focusing primarily on N. gonorrhoeae. This limits the generalizability of the findings and leaves questions about its broader antibacterial spectrum. Moreover, while the study demonstrates the in vitro effectiveness of oxydifficidin, there is a lack of in vivo validation (i.e., animal models) for assessing pre-clinical potential of oxydifficidin. Potential SNPs within dedA or RplL raise concerns about how quickly resistance could emerge in clinical settings.

      (1) Spectrum/narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The focus of this study is on its previously unreported potent anti-gonococcal activity and its mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      (2) Animal models: We acknowledge the reviewer’s insight regarding the importance of in vivo validation to enhance oxydifficidin’s pre-clinical potential. However, due to the labor-intensive process needed to isolate oxydifficidin, obtaining a sufficient quantity for animal studies is beyond the scope of this study. Our future work will focus on optimizing the yield of oxydifficidin and developing a topical mouse model for subsequent investigations.

      (3) Potential SNPs: Please see our response to Reviewer #1’s comment 3. We acknowledge that potential SNPs within dedA and rplL raise concerns regarding clinical resistance, which is a common issue for protein-targeting antibiotics. Yet, as pointed out in the manuscript, obtaining mutants in the lab was a very low yield endeavor.

      Reviewer #3 (Public Review):

      Summary:

      The authors have shown that oxydifficidin is a potent inhibitor of Neisseria gonorrhoeae. They were able to identify the target of action to rplL and showed that resistance could occur via mutation in the DedA flippase and RplL.

      Strengths:

      This was a very thorough and clearly argued set of experiments that supported their conclusions.

      Weaknesses:

      There was no obvious weakness in the experimental design. Although it is promising that the DedA mutations resulted in attenuation of fitness, it remains an open question whether secondary rounds of mutation could overcome this selective disadvantage which was untried in this study.

      We thank the reviewer for the positive comment. We agree that investigating factors that could compensate for the fitness attenuation caused by DedA mutation would enhance our understanding of the role of DedA.

    1. Author response:

      We thank you for the opportunity to provide a concise response. The criticisms are accurately summarized in the eLife assessment:

      the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks.

      The essence of our study is to propose the adoption of the Haldane model of genetic drift, based on the branching process, in lieu of the Wright-Fisher (WF) model, based on sampling, usually binomial.  In addition to some extensions of the Haldane model, we present 4 paradoxes that cannot be resolved by the WF model. The reviews suggest that some of the paradoxes could be resolved by the WF model, if we engage prior literature sufficiently.

      We certainly could not review all the literature on genetic drift as there must be thousands of them. Nevertheless, the literature we do not cover is based on the WF model, which has the general properties that all modifications of the WF model share.  (We should note that all such modifications share the sampling aspect of the WF model. To model such sampling, N is imposed from outside of the model, rather than self-generating within the model.  Most important, these modifications are mathematically valid but biologically untenable, as will be elaborated below. Thus, in concept, the WF and Haldane models are fundamentally different.)

      In short, our proposal is general with the key point that the WF model cannot resolve these (and many other) paradoxes.  The reviewers disagree (apparently only partially) and we shall be specific in our response below.

      We shall first present the 4th paradox, which is about multi-copy gene systems (such as rRNA genes and viruses, see the companion paper). Viruses evolve both within and between hosts. In both stages, there are severe bottlenecks.  How does one address the genetic drift in viral evolution? How can we model the effective population sizes both within- and between- hosts?  The inability of the WF model in dealing with such multi-copy gene systems may explain the difficulties in accounting for the SARS-CoV-2 evolution. Given the small number of virions transmitted between hosts, drift is strong which we have shown by using the Haldane model (Ruan, Luo, et al. 2021; Ruan, Wen, et al. 2021; Hou, et al. 2023). 

      As the reviewers suggest, it is possible to modify the WF model to account for some of these paradoxes. However, the modifications are often mathematically convenient but biologically dubious. Much of the debate is about the progeny number, K.  (We shall use haploid model for this purpose but diploidy does not pose a problem as stated in the main text.) The modifications relax the constraint of V(k) = E(k) inherent in the WF sampling.  One would then ask how V(k) can be different from E(k) in the WF sampling even though it is mathematically feasible (but biologically dubious)?  Kimura and Crow (1963) may be the first to offer a biological explanation.  If one reads it carefully, Kimura's modification is to make the WF model like the Haldane model. Then, why don't we use the Haldane model in the first place by having two parameters, E(k) and V(k), instead of the one-parameter WF model?

      The Haldane model is conceptually simpler. It allows the variation in population size, N, to be generated from within the model, rather than artificially imposed from outside of the model.  This brings us to the first paradox, the density-dependent Haldane model. When N is increasing exponentially as in bacterial or yeast cultures, there is almost no drift when N is very low and drift becomes intense as N grows to near the carrying capacity.  We do not see how the WF model can resolve this paradox, which can otherwise be resolved by the Haldane model.

      The second and third paradoxes are about how much mathematical models of population genetic can be detached from biological mechanisms. The second paradox about sex chromosomes is rooted in the realization of V(k) ≠ E(k).  Since E(k) is the same between sexes but V(k) is different, how does the WF sampling give rise to V(k) ≠ E(k)? We are asking a biological question that troubled Kimura and Crow (1963) alluded above. The third paradox is acknowledged by two reviewers. Genetic drift manifested in the fixation probability of an advantageous mutation is 2s/V(k).  It is thus strange that the fundamental parameter of drift in the WF model, N (or Ne), is missing.  In the Haldane model, drift is determined by V(k) with N being a scaling factor; hence 2s/V(k) makes perfect biological sense,

      We now answer the obvious question: If the model is fundamentally about the Haldane model, why do we call it the WF-Haldane model? The reason is that most results obtained by the WF model are pretty good approximations and the branching process may not need to constantly re-derive the results.  At least, one can use the WF results to see how well they fit into the Haldane model. In our earlier study (Chen, et al. (2017); Fig. 3), we show that the approximations can be very good in many (or most) settings.

      We would like to use the modern analogy of gas-engine cars vs. electric-motor ones. The Haldane model and the WF model are as fundamentally different concepts as the driving mechanisms of gas-powered vs electric cars.  The old model is now facing many problems and the fixes are often not possible.  Some fixes are so complicated that one starts thinking about simpler solutions. The reservations are that we have invested so much in the old models which might be wasted by the switch. However, we are suggesting the integration of the WF and Haldane models. In this sense, the WF model has had many contributions which the new model gratefully inherits. This is true with the legacy of gas-engine cars inherited by EVs.

      The editors also issue the instruction: while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims. 

      We are thankful to the editors and reviewers for the thoughtful comments and constructive criticisms. We also appreciate the publishing philosophy of eLife that allows exchanges, debates and improvements, which are the true spirits of science publishing.

      References for the provisional author responses

      Chen Y, Tong D, Wu CI. 2017. A New Formulation of Random Genetic Drift and Its Application to the Evolution of Cell Populations. Mol. Biol. Evol. 34:2057-2064.

      Hou M, Shi J, Gong Z, Wen H, Lan Y, Deng X, Fan Q, Li J, Jiang M, Tang X, et al. 2023. Intra- vs. Interhost Evolution of SARS-CoV-2 Driven by Uncorrelated Selection-The Evolution Thwarted. Mol. Biol. Evol. 40.

      Kimura M, Crow JF. 1963. The measurement of effective population number. Evolution:279-288.

      Ruan Y, Luo Z, Tang X, Li G, Wen H, He X, Lu X, Lu J, Wu CI. 2021. On the founder effect in COVID-19 outbreaks: how many infected travelers may have started them all? Natl. Sci. Rev. 8:nwaa246.

      Ruan Y, Wen H, He X, Wu CI. 2021. A theoretical exploration of the origin and early evolution of a pandemic. Sci Bull (Beijing) 66:1022-1029.

      Review comments

      eLife assessment 

      This study presents a useful modification of a standard model of genetic drift by incorporating variance in offspring numbers, claiming to address several paradoxes in molecular evolution.

      It is unfortunate that the study fails to engage prior literature that has extensively examined the impact of variance in offspring number, implying that some of the paradoxes presented might be resolved within existing frameworks.

      We do not believe that the paradoxes can be resolved.

      In addition, while the modified model yields intriguing theoretical predictions, the simulations and empirical analyses are incomplete to support the authors' claims. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors present a theoretical treatment of what they term the "Wright-Fisher-Haldane" model, a claimed modification of the standard model of genetic drift that accounts for variability in offspring number, and argue that it resolves a number of paradoxes in molecular evolution. Ultimately, I found this manuscript quite strange.

      The notion of effective population size as inversely related to the variance in offspring number is well known in the literature, and not exclusive to Haldane's branching process treatment. However, I found the authors' point about variance in offspring changing over the course of, e.g. exponential growth fairly interesting, and I'm not sure I'd seen that pointed out before.

      Nonetheless, I don't think the authors' modeling, simulations, or empirical data analysis are sufficient to justify their claims. 

      Weaknesses: 

      I have several outstanding issues. First of all, the authors really do not engage with the literature regarding different notions of an effective population. Most strikingly, the authors don't talk about Cannings models at all, which are a broad class of models with non-Poisson offspring distributions that nonetheless converge to the standard Wright-Fisher diffusion under many circumstances, and to "jumpy" diffusions/coalescents otherwise (see e.g. Mohle 1998, Sagitov (2003), Der et al (2011), etc.). Moreover, there is extensive literature on effective population sizes in populations whose sizes vary with time, such as Sano et al (2004) and Sjodin et al (2005).

      Of course in many cases here the discussion is under neutrality, but it seems like the authors really need to engage with this literature more. 

      The most interesting part of the manuscript, I think, is the discussion of the Density Dependent Haldane model (DDH). However, I feel like I did not fully understand some of the derivation presented in this section, which might be my own fault. For instance, I can't tell if Equation 5 is a result or an assumption - when I attempted a naive derivation of Equation 5, I obtained E(K_t) = 1 + r/c*(c-n)*dt. It's unclear where the parameter z comes from, for example. Similarly, is equation 6 a derivation or an assumption? Finally, I'm not 100% sure how to interpret equation 7. I that a variance effective size at time t? Is it possible to obtain something like a coalescent Ne or an expected number of segregating sites or something from this? 

      Similarly, I don't understand their simulations. I expected that the authors would do individual-based simulations under a stochastic model of logistic growth, and show that you naturally get variance in offspring number that changes over time. But it seems that they simply used their equations 5 and 6 to fix those values. Moreover, I don't understand how they enforce population regulation in their simulations---is N_t random and determined by the (independent) draws from K_t for each individual? In that case, there's no "interaction" between individuals (except abstractly, since logistic growth arises from a model that assumes interactions between individuals). This seems problematic for their model, which is essentially motivated by the fact that early during logistic growth, there are basically no interactions, and later there are, which increases variance in reproduction. But their simulations assume no interactions throughout! 

      The authors also attempt to show that changing variance in reproductive success occurs naturally during exponential growth using a yeast experiment. However, the authors are not counting the offspring of individual yeast during growth (which I'm sure is quite hard). Instead, they use an equation that estimates the variance in offspring number based on the observed population size, as shown in the section "Estimation of V(K) and E(K) in yeast cells". This is fairly clever, however, I am not sure it is right, because the authors neglect covariance in offspring between individuals. My attempt at this derivation assumes that I_t | I_{t-1} = \sum_{I=1}^{I_{t-1}} K_{i,t-1} where K_{i,t-1} is the number of offspring of individual i at time t-1. Then, for example, E(V(I_t | I_{t-1})) = E(V(\sum_{i=1}^{I_{t-1}} K_{i,t-1})) = E(I_{t-1})V(K_{t-1}) + E(I_{k-1}(I_{k-1}-1))*Cov(K_{i,t-1},K_{j,t-1}). The authors have the first term, but not the second, and I'm not sure the second can be neglected (in fact, I believe it's the second term that's actually important, as early on during growth there is very little covariance because resources aren't constrained, but at carrying capacity, an individual having offspring means that another individuals has to have fewer offspring - this is the whole notion of exchangeability, also neglected in this manuscript). As such, I don't believe that their analysis of the empirical data supports their claim. 

      Thus, while I think there are some interesting ideas in this manuscript, I believe it has some fundamental issues:

      first, it fails to engage thoroughly with the literature on a very important topic that has been studied extensively. Second, I do not believe their simulations are appropriate to show what they want to show. And finally, I don't think their empirical analysis shows what they want to show. 

      References: 

      Möhle M. Robustness results for the coalescent. Journal of Applied Probability. 1998;35(2):438-447. doi:10.1239/jap/1032192859 

      Sagitov S. Convergence to the coalescent with simultaneous multiple mergers. Journal of Applied Probability. 2003;40(4):839-854. doi:10.1239/jap/1067436085 

      Der, Ricky, Charles L. Epstein, and Joshua B. Plotkin. "Generalized population models and the nature of genetic drift." Theoretical population biology 80.2 (2011): 80-99 

      Sano, Akinori, Akinobu Shimizu, and Masaru Iizuka. "Coalescent process with fluctuating population size and its effective size." Theoretical population biology 65.1 (2004): 39-48 

      Sjodin, P., et al. "On the meaning and existence of an effective population size." Genetics 169.2 (2005): 1061-1070 

      Reviewer #2 (Public Review): 

      Summary: 

      This theoretical paper examines genetic drift in scenarios deviating from the standard Wright-Fisher model. The authors discuss Haldane's branching process model, highlighting that the variance in reproductive success equates to genetic drift. By integrating the Wright-Fisher model with the Haldane model, the authors derive theoretical results that resolve paradoxes related to effective population size. 

      Strengths: 

      The most significant and compelling result from this paper is perhaps that the probability of fixing a new beneficial mutation is 2s/V(K). This is an intriguing and potentially generalizable discovery that could be applied to many different study systems. 

      The authors also made a lot of effort to connect theory with various real-world examples, such as genetic diversity in sex chromosomes and reproductive variance across different species. 

      Weaknesses: 

      One way to define effective population size is by the inverse of the coalescent rate. This is where the geometric mean of Ne comes from. If Ne is defined this way, many of the paradoxes mentioned seem to resolve naturally. If we take this approach, one could easily show that a large N population can still have a low coalescent rate depending on the reproduction model. However, the authors did not discuss Ne in light of the coalescent theory. This is surprising given that Eldon and Wakeley's 2006 paper is cited in the introduction, and the multiple mergers coalescent was introduced to explain the discrepancy between census size and effective population size, superspreaders, and reproduction variance - that said, there is no explicit discussion or introduction of the multiple mergers coalescent. 

      The Wright-Fisher model is often treated as a special case of the Cannings 1974 model, which incorporates the variance in reproductive success. This model should be discussed. It is unclear to me whether the results here have to be explained by the newly introduced WFH model, or could have been explained by the existing Cannings model. 

      The abstract makes it difficult to discern the main focus of the paper. It spends most of the space introducing "paradoxes". 

      The standard Wright-Fisher model makes several assumptions, including hermaphroditism, non-overlapping generations, random mating, and no selection. It will be more helpful to clarify which assumptions are being violated in each tested scenario, as V(K) is often not the only assumption being violated. For example, the logistic growth model assumes no cell death at the exponential growth phase, so it also violates the assumption about non-overlapping generations. 

      The theory and data regarding sex chromosomes do not align. The fact that \hat{alpha'} can be negative does not make sense. The authors claim that a negative \hat{alpha'} is equivalent to infinity, but why is that? It is also unclear how theta is defined. It seems to me that one should take the first principle approach e.g., define theta as pairwise genetic diversity, and start with deriving the expected pair-wise coalescence time under the MMC model, rather than starting with assuming theta = 4Neu. Overall, the theory in this section is not well supported by the data, and the explanation is insufficient. 

      {Alpha and alpha' can both be negative.  X^2 = 0.47 would yield x = -0.7}

      Reviewer #3 (Public Review): 

      Summary: 

      Ruan and colleagues consider a branching process model (in their terminology the "Haldane model") and the most basic Wright-Fisher model. They convincingly show that offspring distributions are usually non-Poissonian (as opposed to what's assumed in the Wright-Fisher model), and can depend on short-term ecological dynamics (e.g., variance in offspring number may be smaller during exponential growth). The authors discuss branching processes and the Wright-Fisher model in the context of 3 "paradoxes": (1) how Ne depends on N might depend on population dynamics; (2) how Ne is different on the X chromosome, the Y chromosome, and the autosomes, and these differences do match the expectations base on simple counts of the number of chromosomes in the populations; (3) how genetic drift interacts with selection. The authors provide some theoretical explanations for the role of variance in the offspring distribution in each of these three paradoxes. They also perform some experiments to directly measure the variance in offspring number, as well as perform some analyses of published data. 

      Strengths: 

      (1) The theoretical results are well-described and easy to follow. 

      (2) The analyses of different variances in offspring number (both experimentally and analyzing public data) are convincing that non-Poissonian offspring distributions are the norm. 

      (3) The point that this variance can change as the population size (or population dynamics) change is also very interesting and important to keep in mind. 

      (4) I enjoyed the Density-Dependent Haldane model. It was a nice example of the decoupling of census size and effective size. 

      Weaknesses: 

      (1) I am not convinced that these types of effects cannot just be absorbed into some time-varying Ne and still be well-modeled by the Wright-Fisher process. 

      (2) Along these lines, there is well-established literature showing that a broad class of processes (a large subset of Cannings' Exchangeable Models) converge to the Wright-Fisher diffusion, even those with non-Poissonian offspring distributions (e.g., Mohle and Sagitov 2001). E.g., equation (4) in Mohle and Sagitov 2001 shows that in such cases the "coalescent Ne" should be (N-1) / Var(K), essentially matching equation (3) in the present paper. 

      (3) Beyond this, I would imagine that branching processes with heavy-tailed offspring distributions could result in deviations that are not well captured by the authors' WFH model. In this case, the processes are known to converge (backward-in-time) to Lambda or Xi coalescents (e.g., Eldon and Wakely 2006 or again in Mohle and Sagitov 2001 and subsequent papers), which have well-defined forward-in-time processes. 

      (4) These results that Ne in the Wright-Fisher process might not be related to N in any straightforward (or even one-to-one) way are well-known (e.g., Neher and Hallatschek 2012; Spence, Kamm, and Song 2016; Matuszewski, Hildebrandt, Achaz, and Jensen 2018; Rice, Novembre, and Desai 2018; the work of Lounès Chikhi on how Ne can be affected by population structure; etc...) 

      (5) I was also missing some discussion of the relationship between the branching process and the Wright-Fisher model (or more generally Cannings' Exchangeable Models) when conditioning on the total population size. In particular, if the offspring distribution is Poisson, then conditioned on the total population size, the branching process is identical to the Wright-Fisher model. 

      (6) In the discussion, it is claimed that the last glacial maximum could have caused the bottleneck observed in human populations currently residing outside of Africa. Compelling evidence has been amassed that this bottleneck is due to serial founder events associated with the out-of-Africa migration (see e.g., Henn, Cavalli-Sforza, and Feldman 2012 for an older review - subsequent work has only strengthened this view). For me, a more compelling example of changes in carrying capacity would be the advent of agriculture ~11kya and other more recent technological advances. 

      Recommendations for the authors: 

      Reviewing Editor Comments: 

      The reviewers recognize the value of this model and some of the findings, particularly results from the density-dependent Haldane model. However, they expressed considerable concerns with the model and overall framing of this manuscript.

      First, all reviewers pointed out that the manuscript does not sufficiently engage with the extensive literature on various models of effective population size and genetic drift, notably lacking discussion on Cannings models and related works.

      Second, there is a disproportionate discussion on the paradoxes, yet some of the paradoxes might already be resolved within current theoretical frameworks. All three reviewers found the modeling and simulation of the yeast growth experiment hard to follow or lacking justification for certain choices. The analysis approach of sex chromosomes is also questioned. 

      The reviewers recommend a more thorough review of relevant prior literature to better contextualize their findings. The authors need to clarify and/or modify their derivations and simulations of the yeast growth experiment to address the identified caveats and ensure robustness. Additionally, the empirical analysis of the sex chromosome should be revisited, considering alternative scenarios rather than relying solely on the MSE, which only provides a superficial solution. Furthermore, the manuscript's overall framing should be adjusted to emphasize the conclusions drawn from the WFH model, rather than focusing on the "unresolved paradoxes", as some of these may be more readily explained by existing frameworks. Please see the reviewers' overall assessment and specific comments. 

      Reviewer #2 (Recommendations For The Authors): 

      In the introduction -- "Genetic drift is simply V(K)" -- this is a very strong statement. You can say it is inversely proportional to V(K), but drift is often defined based on changes in allele frequency. 

      Page 3 line 86. "sexes is a sufficient explanation."--> "sex could be a sufficient explanation" 

      The strongest line of new results is about 2s/V(K). Perhaps, the paper could put more emphasis on this part and demonstrate the generality of this result with a different example. 

      The math notations in the supplement are not intuitive. e.g., using i_k and j_k as probabilities. I also recommend using E[X] and V[X]for expectation and variance rather than \italic{E(X)} to improve the readability of many equations. 

      Eq A6, A7, While I manage to follow, P_{10}(t) and P_{10} are not defined anywhere in the text. 

      Supplement page 7, the term "probability of fixation" is confusing in a branching model. 

      E.q. A 28. It is unclear eq. A.1 could be used here directly. Some justification would be nice. 

      Supplement page 17. "the biological meaning of negative..". There is no clear justification for this claim. As a reader, I don't have any intuition as to why that is the case.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment:

      Franke et al. explore and characterize the color response properties in the mouse primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The data is solid; however, the evidence supporting some conclusions is incomplete. In its current form, the paper makes a useful contribution to how color is coded in mouse V1. Significance would be enhanced with some additional analyses and a clearer discussion of the limitations of the data presented.

      We thank the reviewers for appreciating our manuscript. We have rewritten the conclusions of the paper to be more conservative and now more explicitly focus on color processing in mouse V1, rather than comparing V1 to the retina. Additionally, we discuss the limitations of our approach in detail in the Discussion section. Finally, we have addressed all comments from the reviewers below.

      Referee 1 (Remarks to the Author):

      In this study, Franke et al. explore and characterize color response properties across primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The authors use awake 2P imaging to define the spectral response properties of visual interneurons in layer 2/3. They find that opponent responses are more pronounced at photopic light levels, and that diversity in color opponent responses exists across the visual field, with green ON/ UV OFF responses more strongly represented in the upper visual field. This is argued to be relevant for the detection of certain features that are more salient when using chromatic space, possibly due to noise reduction. In the revised version, Franke et al. have addressed the potential pitfalls in the discussion, which is an important point for the non-expert reader. Thus, this study provides a solid characterization of the color properties of V1 and is a valuable addition to visual neuroscience research.

      My remaining concerns are based more on the interpretation. I’m still not convinced by the statement "This type of color-opponency in the receptive field center of V1 neurons was not present in the receptive field center of retinal ganglion cells and, therefore, is likely computed by integrating center and surround information downstream of the retina." and I would suggest rewording it in the abstract.

      As discussed previously and now nicely added to the discussion, it is difficult to make a direct comparison given the different stimulus types used to characterize the retina and V1 recordings and the different levels of adaptation in both tissues. I will leave this point to the discussion, which allows for a more nuanced description of the phenomenon. Why do I think this is important? In the introduction, the authors argue that "the discrepancy [of previous studies] may be due to differences in stimulus design or light levels." However, while different light levels can be tested in V1, this cannot be done properly in the retina with 2P experiments. To address this, one would have to examine color-opponency in RGC terminals in vivo, which is beyond the scope of this study. Addressing these latter points directly in the discussion would, in my opinion, only strengthen the study.

      We thank the reviewer for the feedback. We removed the sentence mentioned by the reviewer from the abstract, as well as from the summary of our results in the Introduction. Additionally, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      Minor:

      In the abstract, the second sentence says that we already know the mechanisms in primates.

      Unfortunately, I do not think this is true. First, primates refers to an order with several species, which might have adaptations to their color-processing. Second, I’m aware of several characterizations in "primates" that have led to convincing models (as referenced), but in my opinion, this is far from a true understanding the mechanisms, especially since very little is known about foveal color processing due to the difficulties of these experiments. Similarly in the introduction. "Primates" is indirectly defined as a species. Perhaps some rewording is needed here as well, since we know how different cone distributions can be in rodents (see Peichl’s work).

      Thanks. We have reworded the Abstract and Introduction towards indicating that many studies have been performed in primate species, without suggesting that the mechanisms are described.

      The legend in Fig. 2 has a "Fig. ???"

      Fixed.

      Referee 2 (Remarks to the Author):

      Franke et al. characterize the representation of color in the primary visual cortex of mice, highlighting how this changes across the visual field. Using calcium imaging in awake, head-fixed mice, they characterize the properties of V1 neurons (layer 2/3) using a large center-surround stimulation where green and ultra-violet colors were presented in random combinations. Clustering of responses revealed a set of functional cell-types based on their preference to different combinations of green and UV in their center and surround. These functional types were demonstrated to have different spatial distributions across V1, including one neuronal type (Green-ON/UV-OFF) that was much more prominent in the posterior V1 (i.e. upper visual field). Modelling work suggests that these neurons likely support the detection of predator-like objects in the sky.

      Strengths: The large-scale single-cell resolution imaging used in this work allows the authors to map the responses of individual neurons across large regions of the visual cortex. Combining this large dataset with clustering analysis enabled the authors to group V1 neurons into distinct functional cell types and demonstrate their relative distribution in the upper and lower visual fields. Modelling work demonstrated the different capacity of each functional type to detect objects in the sky, providing insight into the ethological relevance of color opponent neurons in V1.

      We thank the reviewer for appreciating our study.

      Weaknesses: While the study presents convincing evidence about the asymmetric distribution of color-opponent neurons in V1, the paper would greatly benefit from a more in-depth discussion of the caveats related to the conclusions drawn about their origin. This is particularly relevant regarding the conclusion drawn about the contribution of color opponent neurons in the retina. The mismatch between retinal color opponency and V1 color opponency could imply that this feature is not solely inherited from the retina, however, there are other plausible explanations that are not discussed here. Direct evidence for this statement remains weak.

      Thanks for this comment. We removed the retinal findings from the abstract, as well as from the summary of our results in the Introduction. In addition, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      In addition, the paper would benefit from adding explicit neuron counts or percentages to the quadrants of each of the density plots in Figures 2-5. The variance explained by the principal components does not capture the percentage of color opponent cells. Additionally, there appear to be some remaining errors in the figure legend and labels that have not been addressed (e.g. ’??’ in Fig 2 legend).

      Thank you for this suggestion. We believe that adding the numbers or percentages to the figure panels would make them too crowded. Instead, we have now mentioned in the Results section and the legends that the percentages of variance explained by the color (off-diagonal) and luminance axis (diagonal) correlate with the number of neurons located in the color (top left and bottom right) and luminance contrast quadrants (top right and bottom left), respectively. Together with the number of neurons in each plot stated in the legends and the scale bar indicating the number of neurons per gray level, we hope this approach provides clarity for the reader to interpret the panels. Additionally, we have fixed the broken reference in the legend of Fig. 2.

      Overall, this study will be a valuable resource for researchers studying color vision, cortical processing, and the processing of ethologically relevant information. It provides a useful basis for future work on the origin of color opponency in V1 and its ethological relevance.

      General Suggestions:

      -  Please add possible caveats of using ETA method to the discussion section. For example, it is unclear to what extent ON/OFF cells are being overlooked by using ETA method.

      We now discuss the limitations of the ETA approach in the Discussion section.

      - The caveats of using the percentage of variance explained in the retina as evidence against V1 solely inheriting color-opponency from retinal output neurons are not adequately addressed. For example, could the mismatch in explained variance of the color axis between V1 and RGCs be explained by a subset of non-color opponent RGCs projecting elsewhere (not dLGN-V1) or that color opponent cells project to a larger number of neurons in V1 than non-color opponent cells? We suggest adding a paragraph to the discussion to address this issue.

      We have removed these conclusions from the paper, more carefully interpret the retinal results and mention that comparing ex-vivo retina data with in-vivo cortical data is challenging.

      - Please clarify how the different response types shown in Figure 5e-f lead to differences in noise detection and thereby differences in predator discriminability. For example, why does Gon/UVoff not respond to the noise scene while Goff/UVoff does?

      We added this to the Results section.

      - Please clarify the relationship between ETA amplitude, neural response probability, and neural response amplitude. For example, do color-opponent cells have equal absolute neural response amplitudes to the different colors?

      Thank you for bringing up this point. The ETA is obtained by summing the stimulus sequences that elicit an event (i.e., response), weighted by the amplitude of the response. Consequently, the absolute amplitude of the ETA correlates with the calcium amplitude. Importantly, the ETA amplitudes of different stimulus conditions are comparable because they were estimated on the same normalized calcium trace. Therefore, comparing the absolute amplitudes of ETAs of color-opponent neurons reveals the response magnitude of the cells to different colors. We have now included this information in the Results section.

      Abstract: - "more than a third of neurons in mouse V1 are color-opponent in their receptive field center". It is unclear what data supports this statement. Can you please provide a statement in the manuscript that supports this directly using the number of neurons?

      We added the following sentence to the Results section: Nevertheless, a substantial fraction of neurons (33.1%) preferred color-opponent stimuli and scattered along the off-diagonal in the upper left and lower right quadrants, especially for the RF center.

      Figure 2: - There is a ?? in the figure legend. Which figure should this refer to? - please provide explicit neuron counts/percentages for each quadrant in b.

      We fixed the figure reference. We believe that adding the numbers or percentages to the figure panels would make them too crowded. Instead, we have now mentioned in the Results section and the legends that the percentages of variance explained by the color (off-diagonal) and luminance axis (diagonal) correlate with the number of neurons located in the color (top left and bottom right) and luminance contrast quadrants (top right and bottom left), respectively. Together with the number of neurons in each plot stated in the legends and the scale bar indicating the number of neurons per gray level, we hope this approach provides clarity for the reader to interpret the panels.

      Figure 3: - Fig 3: Color scheme makes it very difficult to differentiate the different conditions, especially when printed.

      Thanks we changed the color scheme.

      - Add explicit neuron counts/percentages for each quadrant in b.

      See above.

      Figure 4: - Add explicit neuron counts/percentages for each quadrant in b.

      See above.

      Figure 5: - Add explicit neuron counts/percentages for each quadrant in c.

      See above.

      Methods: - "we modeled each response type to have a square RF with 10 degrees visual angle in diameter". There appears to be a mismatch between this statement and Figure 5e where 18 degrees is reported.

      Thanks we fixed that.

      Referee 3 (Remarks to the Author):

      This paper studies chromatic coding in mouse primary visual cortex. Calcium responses of a large collection of cells are measured in response to a simple spot stimulus. These responses are used to estimate chromatic tuning properties - specifically sensitivity to UV and green stimuli presented in a large central spot or a larger still surrounding region. Cells are divided based on their responses to these stimuli into luminance or chromatic sensitive groups. The results are interesting and many aspects of the experiments and conclusions are well done; several technical concerns, however, limit the support for several main conclusions,

      Limitations of stimulus choice The paper relies on responses to a large (37.5 degree diameter) modulated spot and surround region. This spot is considerably larger than the receptive fields of both V1 cells and retinal ganglion cells (it is twice the area of the average V1 receptive field). As a result, the spot itself is very likely to strongly activate both center and surround mechanisms, and responses of cells are likely to depend on where the receptive fields are located within the spot

      (and, e.g., how much of the true neural surround samples the center spot vs the surround region). Most importantly, the surrounds of most of the recorded cells will be strongly activated by the central spot. This brings into question statements in the paper about selective activation of center and surround (e.g. page 2, right column). This in turn raises questions about several subsequent analyses that rely on selective center and surround activation.

      Thank you for this comment. A similar point was raised by a reviewer in the first round of revision. We agree with the reviewers that it is critical to discuss both the rationale behind our stimulus design and its limitations to facilitate better interpretation by the reader.

      To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons (between 20 - 30 degrees visual angle depending on the stimulus, see here). The disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we used the following steps: { For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      Together, we believe these points strongly suggest that the center spot and the surround annulus of the noise stimulus predominantly drive center (i.e. classical RF) and surround (i.e. extraclassical RF), respectively, of the recorded V1 neurons. This is further supported by the fact that color response types identified using an automated clustering method were robust across mice (Suppl. Fig. 6c), indicating consistent stimulus centering.

      Nevertheless, we cannot exclude the possibility that the stimulus was misaligned for a subset of the recorded neurons used in our analysis. We agree with the reviewer that such misalignment might have caused the center stimulus to partially activate the surround. To further address this issue beyond the controls we have already implemented, one could compare the results of our approach with an approach that centers the stimulus on individual neurons. However, we believe that performing these additional experiments is beyond the scope of the current study.

      To acknowledge the experimental limitations of our study and the concerns brought up by the reviewer, we have added the steps we perform to reduce the effects of stimulus misalignment in the Results section and discuss the problem of stimulus alignment in the Discussion in a separate section. With this, we believe our manuscript explains both the rationale behind our stimulus design as well as important limitations of the approach.

      Comparison with retina A key conclusion of the paper is that the chromatic tuning in V1 is not inherited from retinal ganglion cells. This conclusion comes from comparing chromatic tuning in a previously-collected data set from retina with the present results. But the retina recordings were made using a considerably smaller spot, and hence it is not clear that the comparison made in the paper is accurate. For example, the stimulus used for the V1 experiments almost certainly strongly stimulates both center and surround of retinal ganglion cells. The text focuses on color opponency in the receptive field centers of retinal ganglion cells, but center-surround opponency seems at least as relevant for such large spots. This issue needs to be described more clearly and earlier in the paper.

      Thanks for this comment. We removed the retinal findings from the abstract, as well as from the summary of our results in the Introduction. In addition, we now phrase the interpretation of the retinal results more conservatively and specifically highlight in the Discussion that comparing ex-vivo retinal to in-vivo cortical data is challenging. With these changes, we believe that the focus of the paper is explicitly defined to be on the neuronal representation of color in mouse visual cortex, rather than on the comparison of retinal and cortical color processing.

      Limitations associated with ETA analysis One of the reviewers in the previous round of reviews raised the concern that the ETA analysis may not accurately capture responses of cells with nonlinear receptive field properties such as On/Off cells. This possibility and whether it is a concern should be discussed.

      Thanks for this comment. We now discuss the limitation of using an ETA analysis in the

      Discussion section.

      Discrimination performance poor Discriminability of color or luminance is used as a measure of population coding. The discrimination performance appears to be quite poor - with 500-1000 neurons needed to reliably distinguish light from dark or green from UV. Intuitively I would expect that a single cell would provide such discrimination. Is this intuition wrong? If not, how do we interpret the discrimination analyses?

      Thank you for raising this point. The plots in Fig. 2c (and Figs. 3-5) show discriminability in bits, with the discrimination accuracy in % highlighted by the dotted horizontal lines. For 500 neurons, the discriminability is approx. 0.8 bits, corresponding to 95% accuracy. Even for 50 neurons, the accuracy is significantly above chance level. We now mention in the legends that the dotted lines indicate decoding accuracy in %.

    1. Author response:

      The following is the authors’ response to the current reviews.

      (1) Though we cannot survey all mutants, our observation that 774 genetically diverse adaptive mutants converge at the level of phenotype is important. It adds to growing evidence (see PMID33263280, PMID37437111, PMID22282810, PMID25806684) that the genetic basis of adaptation is not as diverse as the phenotypic basis. This convergence could make evolution more predictable.

      (2) Previous fitness competitions using this specific barcode system have been run for greater than 25 generations (PMID33263280, PMID27594428, PMID37861305, PMID27594428). We measure fitness per cycle, rather than per generation, so our fitness advantages are comparable to those in the aforementioned studies, including Venkataram and Dunn et al. (PMID27594428).

      (3) Our results remain the same upon removing the ~150 lineages with the noisiest fitness inferences, including those the reviewer mentions (see Figure S7).

      (4) We agree that there are likely more than the 6 clusters that we validated with follow-up studies (see Discussion). The important point is that we see a great deal of convergence in the behavior of diverse adaptive mutants.

      (5) The growth curves requested by the reviewer were included in our original manuscript; several more were added in the revision (see Figures 5D, 5E, 7D, S11B, S11C).


      The following is the authors’ response to the original reviews.

      Public Reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In their manuscript, Schmidlin, Apodaca, et al try to answer fundamental questions about the evolution of new phenotypes and the trade-offs associated with this process. As a model, they use yeast resistance to two drugs, fluconazole and radicicol. They use barcoded libraries of isogenic yeasts to evolve thousands of strains in 12 different environments. They then measure the fitness of evolved strains in all environments and use these measurements to examine patterns in fitness trade-offs. They identify only six major clusters corresponding to different trade-off profiles, suggesting the vast genotypic landscape of evolved mutants translates to a highly constrained phenotypic space. They sequence over a hundred evolved strains and find that mutations in the same gene can result in different phenotypic profiles.  

      Overall, the authors deploy innovative methods to scale up experimental evolution experiments, and in many aspects of their approach tried to minimize experimental variation. 

      We thank the reviewer for this positive assessment of our work. We are happy that the reviewer noted what we feel is a unique strength of our approach: we scaled up experimental evolution by using DNA barcodes and by exploring 12 related selection pressures.  Despite this scaling up, we still see phenotypic convergence among the 744 adaptive mutants we study. 

      Weaknesses: 

      (1) One of the objectives of the authors is to characterize the extent of phenotypic diversity in terms of resistance trade-offs between fluconazole and radicicol. To minimize noise in the measurement of relative fitness, the authors only included strains with at least 500 barcode counts across all time points in all 12 experimental conditions, resulting in a set of 774 lineages passing this threshold. This corresponds to a very small fraction of the starting set of ~21 000 lineages that were combined after experimental evolution for fitness measurements. 

      This is a misunderstanding that we clarified in this revision. Our starting set did not include 21,000 adaptive lineages. The total number of unique adaptive lineages in this starting set is much lower than 21,000 for two reasons. 

      First, ~21,000 represents the number of single colonies we isolated in total from our evolution experiments. Many of these isolates possess the same barcode, meaning they are duplicates. Second, and perhaps more importantly, most evolved lineages do not acquire adaptive mutations, meaning that many of the 21,000 isolates are genetically identical to their ancestor. In our revised manuscript, we explicitly stated that these 21,000 isolated lineages do not all represent unique, adaptive lineages. We changed the word “lineages” to “isolates” where relevant in Figure 2 and the accompanying legend. And we have added the following sentence to the figure 2 legend (line 212), “These ~21,000 isolates do not represent as many unique, adaptive lineages because many either have the same barcode or do not possess adaptive mutations.”

      More broadly speaking, several previous studies have demonstrated that diverse genetic mutations converge at the level of phenotype and have suggested that this convergence makes adaptation more predictable (PMID33263280, PMID37437111, PMID22282810, PMID25806684). Most of these studies survey fewer than 774 mutants. Further, our study captures mutants that are overlooked in previous studies, such as those that emerge across subtly different selection pressures (e.g., 4 𝜇g/ml vs. 8 𝜇g/ml flu) and those that are undetectable in evolutions lacking DNA barcodes. Thus, while our experimental design misses some mutants (see next comment), it captures many others. Thus, we feel that “our work – showing that 774 mutants fall into a much smaller number of groups” is important because it “contributes to growing literature suggesting that the phenotypic basis of adaptation is not as diverse as the genetic basis (lines 176 - 178).”

      As the authors briefly remark, this will bias their datasets for lineages with high fitness in all 12 environments, as all these strains must be fit enough to maintain a high abundance. 

      We now devote 19 lines of text to discussing this bias (on lines 160 - 162, 278-284, and in more detail on 758 - 767).

      We walk through an example of a class of mutants that our study misses. One lines 759 - 763, we say, “our study is underpowered to detect adaptive lineages that have low fitness in any of the 12 environments. This is bound to exclude large numbers of adaptive mutants. For example, previous work has shown some FLU resistant mutants have strong tradeoffs in RAD (Cowen and Lindquist 2005). Perhaps we are unable to detect these mutants because their barcodes are at too low a frequency in RAD environments, thus they are excluded from our collection of 774.”

      In our revised version, we added more text earlier in the manuscript that explicitly discusses this bias. Lines 278 – 283 now read, “The 774 lineages we focus on are biased towards those that are reproducibly adaptive in multiple environments we study. This is because lineages that have low fitness in a particular environment are rarely observed >500 times in that environment (Figure S4). By requiring lineages to have high-coverage fitness measurements in all 12 conditions, we may be excluding adaptive mutants that have severe tradeoffs in one or more environments, consequently blinding ourselves to mutants that act via unique underlying mechanisms.”

      Note that while we “miss” some classes of mutants, we “catch” other classes that may have been missed in previous studies of convergence. For example, we observe a unique class of FLU-resistant mutants that primarily emerged in evolution experiments that lack FLU (Figure 3). Thus, we think that the unique design of our study, surveying 12 environments, allows us to make a novel contribution to the study of phenotypic convergence.

      One of the main observations of the authors is phenotypic space is constrained to a few clusters of roughly similar relative fitness patterns, giving hope that such clusters could be enumerated and considered to design antimicrobial treatment strategies. However, by excluding all lineages that fit in only one or a few environments, they conceal much of the diversity that might exist in terms of trade-offs and set up an inclusion threshold that might present only a small fraction of phenotypic space with characteristics consistent with generalist resistance mechanisms or broadly increased fitness. This has important implications regarding the general conclusions of the authors regarding the evolution of trade-offs. 

      We agree and discussed exactly the reviewer’s point about our inclusion threshold in the 19 lines of text mentioned previously (lines 160 - 162, 278-284, and 758 - 767). To add to this discussion, and avoid the misunderstanding the reviewer mentions, we added the following strongly-worded sentence to the end of the paragraph on lines 749 – 767 in our revised manuscript: “This could complicate (or even make impossible) endeavors to design antimicrobial treatment strategies that thwart resistance”. 

      More generally speaking, we set up our study around Figure 1, which depicts a treatment strategy that works best if there exists but a single type of adaptive mutant. Despite our inclusion threshold, we find there are at least 6 types of mutants. This diminishes hopes of designing simple multidrug strategies like Figure 1. Our goal is to present a tempered and nuanced discussion of whether and how to move forward with designing multidrug strategies, given our observations. On one hand, we point out how the phenotypic convergence we observe is promising. But on the other hand, we also point out how there may be less convergence than meets the eye for various reasons including the inclusion threshold the reviewer mentions (lines 749 - 767).

      We have made several minor edits to the text with the goal of providing a more balanced discussion of both sides. For example, we added the words, “may yet” to the following sentences on lines 32 – 36 of the abstract: “These findings, on one hand, demonstrate the difficulty in relying on consistent or intuitive tradeoffs when designing multidrug treatments. On the other hand, by demonstrating that hundreds of adaptive mutations can be reduced to a few groups with characteristic tradeoffs, our findings may yet empower multidrug strategies that leverage tradeoffs to combat resistance.”

      (2) Most large-scale pooled competition assays using barcodes are usually stopped after ~25 to avoid noise due to the emergence of secondary mutations. 

      The rate at which new mutations enter a population is driven by various factors such as the mutation rate and population size, so choosing an arbitrary threshold like 25 generations is difficult. 

      We conducted our fitness competition following previous work using the Levy/Blundell yeast barcode system, in which the number of generations reported varies from 32 to 40 (PMID33263280, PMID27594428, PMID37861305, see PMID27594428 for detailed calculation of the fraction of lineages biased by secondary mutations in this system). 

      The authors measure fitness across ~40 generations, which is almost the same number of generations as in the evolution experiment. This raises the possibility of secondary mutations biasing abundance values, which would not have been detected by the whole genome sequencing as it was performed before the competition assay. 

      Previous work has demonstrated that in this evolution platform, most mutations occur during the transformation that introduces the DNA barcodes (Levy et al. 2015). In other words, these mutations are already present and do not accumulate during the 40 generations of evolution. Therefore, the observation that we collect a genetically diverse pool of adaptive mutants after 40 generations of evolution is not evidence that 40 generations is enough time for secondary mutations to bias abundance values.

      We have added the following sentence to the main text to highlight this issue (lines 247 - 249): “This happens because the barcoding process is slightly mutagenic, thus there is less need to wait for DNA replication errors to introduce mutations (Levy et al. 2015; Venkataram et al. 2016).

      We also elaborate on this in the method section entitled, “Performing barcoded fitness competition experiments,” where we added a full paragraph to clarify this issue (lines 972 - 980).

      (3) The approach used by the authors to identify and visualize clusters of phenotypes among lineages does not seem to consider the uncertainty in the measurement of their relative fitness. As can be seen from Figure S4, the inter-replicate difference in measured fitness can often be quite large. From these graphs, it is also possible to see that some of the fitness measurements do not correlate linearly (ex.: Med Flu, Hi Rad Low Flu), meaning that taking the average of both replicates might not be the best approach.  Because the clustering approach used does not seem to take this variability into account, it becomes difficult to evaluate the strength of the clustering, especially because the UMAP projection does not include any representation of uncertainty around the position of lineages. This might paint a misleading picture where clusters appear well separate and well defined but are in fact much fuzzier, which would impact the conclusion that the phenotypic space is constricted. 

      Our noisiest fitness measurements correspond to barcodes that are the least abundant and thus suffer the most from stochastic sampling noise. These are also the barcodes that introduce the nonlinearity the reviewer mentions. We removed these from our dataset by increasing our coverage threshold from 500 reads to 5,000 reads. The clusters did not collapse, which suggests that they were not capturing this noise (Figure S7B).

      More importantly, we devoted 4 figures and 200 lines of text to demonstrating that the clusters we identified capture biologically meaningful differences between mutants (and not noise). We have modified the main text to point readers to figures 5 through 8 earlier, such that it is more apparent that the clustering analysis is just the first piece of our data demonstrating convergence at the level of phenotype.

      (4) The authors make the decision to use UMAP and a gaussian mixed model to cluster and represent the different fitness landscapes of their lineages of interest. Their approach has many caveats. First, compared to PCA, the axis does not provide any information about the actual dissimilarities between clusters. Using PCA would have allowed a better understanding of the amount of variance explained by components that separate clusters, as well as more interpretable components. 

      The components derived from PCA are often not interpretable. It’s not obvious that each one, or even the first one, will represent an intuitive phenotype, like resistance to fluconazole.  Moreover, we see many non-linearities in our data. For example, fitness in a double drug environment is not predicted by adding up fitness in the relevant single drug environments. Also, there are mutants that have high fitness when fluconazole is absent or abundant, but low fitness when mild concentrations are present. These types of nonlinearities can make the axes in PCA very difficult to interpret, plus these nonlinearities can be missed by PCA, thus we prefer other clustering methods. 

      Still, we agree that confirming our clusters are robust to different clustering methods is helpful. We have included PCA in the revised manuscript, plotting PC1 vs PC2 as Figure S9 with points colored according to the cluster assignment in figure 4 (i.e. using a gaussian mixture model). It appears the clusters are largely preserved.

      Second, the advantages of dimensional reduction are not clear. In the competition experiment, 11/12 conditions (all but the no drug, no DMSO conditions) can be mapped to only three dimensions: concentration of fluconazole, concentration of radicicol, and relative fitness. Each lineage would have its own fitness landscape as defined by the plane formed by relative fitness values in this space, which can then be examined and compared between lineages. 

      We worry that the idea stems from apriori notions of what the important dimensions should be. The biology of our system is unfortunately not intuitive. For example, it seems like this idea would miss important nonlinearities such as our observation that low fluconazole behaves more like a novel selection pressure than a dialed down version of high fluconazole. 

      Third, the choice of 7 clusters as the cutoff for the multiple Gaussian model is not well explained. Based on Figure S6A, BIC starts leveling off at 6 clusters, not 7, and going to 8 clusters would provide the same reduction as going from 6 to 7. This choice also appears arbitrary in Figure S6B, where BIC levels off at 9 clusters when only highly abundant lineages are considered. 

      We agree. We did not rely on the results of BIC alone to make final decisions about how many clusters to include. Another factor we considered were follow-up genotyping and phenotyping studies that confirm biologically meaningful differences between the mutants in each cluster (Figures 5 – 8). We now state this explicitly. Here is the modified paragraph where we describe how we chose a model with 7 clusters, from lines 436 – 446 of the revised manuscript:

      “Beyond the obvious divide between the top and bottom clusters of mutants on the UMAP, we used a gaussian mixture model (GMM) (Fraley and Raftery, 2003) to identify clusters. A common problem in this type of analysis is the risk of dividing the data into clusters based on variation that represents measurement noise rather than reproducible differences between mutants (Mirkin, 2011; Zhao et al., 2008). One way we avoided this was by using a GMM quality control metric (BIC score) to establish how splitting out additional clusters affected model performance (Figure S6). Another factor we considered were follow-up genotyping and phenotyping studies that demonstrate biologically meaningful differences between mutants in different clusters (Figures 5 – 8). Using this information, we identified seven clusters of distinct mutants, including one pertaining to the control strains, and six others pertaining to presumed different classes of adaptive mutant (Figure 4D). It is possible that there exist additional clusters, beyond those we are able to tease apart in this study.”

      This directly contradicts the statement in the main text that clusters are robust to noise, as more a stringent inclusion threshold appears to increase and not decrease the optimal number of clusters. Additional criteria to BIC could have been used to help choose the optimal number of clusters or even if mixed Gaussian modeling is appropriate for this dataset. 

      We are under the following impression: If our clustering method was overfitting, i.e. capturing noise, the optimal number of clusters should decrease when we eliminate noise. It increased. In other words, the observation that our clusters did not collapse (i.e.

      merge) when we removed noise suggests these clusters were not capturing noise. 

      Most importantly, our validation experiments, described below, provide additional evidence that our clusters capture meaningful differences between mutants (and not noise).  

      (5) Large-scale barcode sequencing assays can often be noisy and are generally validated using growth curves or competition assays. 

      Some types of bar-seq methods, in particular those that look at fold change across two time points, are noisier than others that look at how frequency changes across multiple timepoints (PMID30391162). Here, we use the less noisy method. We also reduce noise by using a stricter coverage threshold than previous work (e.g., PMID33263280), and by excluding batch effects by performing all experiments simultaneously, since we found this to be effective in our previous work (PMID37237236). 

      Perhaps also relevant is that the main assay we use to measure fitness has been previously validated (PMID27594428) and no subsequent study using this assay validates using the methods suggested above (see PMID37861305, PMID33263280, PMID31611676, PMID29429618, PMID37192196, PMID34465770, PMID33493203). Similarly, bar-seq has been used, without the suggested validation, to demonstrate that the way some mutant’s fitness changes across environments is different from other mutants (PMID33263280, PMID37861305, PMID31611676, PMID33493203, PMID34596043). This is the same thing that we use bar-seq to demonstrate. 

      For all of these reasons above, we are hesitant to confirm bar-seq itself as a valid way to infer fitness. It seems this is already accepted as a standard in our field. However, please see below.

      Having these types of results would help support the accuracy of the main assay in the manuscript and thus better support the claims of the authors. 

      While we don’t agree that fitness measurements obtained from this bar-seq assay generally require validation, we do agree that it is important to validate whether the mutants in each of our 6 clusters indeed are different from one another in meaningful ways.

      Our manuscript has 4 figures (5 - 8) and over 200 lines of text dedicated to validating whether our clusters capture reproducible and biologically meaningful differences between mutants. In the revised manuscript, we added additional validation experiments, such that three figures (Figures 5, 7 and S11) now involve growth curves, as the reviewer requested. 

      Below, we walk through the different types of validation experiments that are present in our manuscript, including those that were added in this revision.

      (1) Mutants from different clusters have different growth curves: In our original manuscript, we measured growth curves corresponding to a fitness tradeoff that we thought was surprising. Mutants in clusters 4 and 5 both have fitness advantages in single drug conditions. While mutants from cluster 4 also are advantageous in the relevant double drug conditions, mutants from cluster 5 are not! We validated these different behaviors by studying growth curves for a mutant from each cluster (Figures 7 and S11), finding that mutants from different clusters have different growth curves. In the revised manuscript, we added growth curves for 6 additional mutants (3 from cluster 1 and 3 from cluster 3), demonstrating that only the cluster 1 mutants have a tradeoff in high concentrations of fluconazole (see Figure 5D & 5E). In sum, this work demonstrates that mutants from different clusters have predictable differences in their growth phenotypes.

      (2) Mutants from different clusters have different evolutionary origins: In our original manuscript, we came up with a novel way to ask whether the clusters capture different types of adaptive mutants. We asked whether the mutants in each cluster originate from different evolution experiments. They often do (see pie charts in Figures 5, 6, 7, 8). In the revised manuscript, we extended this analysis to include mutants from cluster 1. Cluster 1 is defined by high fitness in low fluconazole that declines with increasing fluconazole. In our revised manuscript, we show that cluster 1 lineages were overwhelmingly sampled from evolutions conducted in our lowest concentration of fluconazole (see pie chart in new Figure 5A). No other cluster’s evolutionary history shows this pattern (compare to pie charts in figures 6, 7, and 8).

      **These pie charts also provide independent confirmation supporting the fitness tradeoffs observed for each cluster in figure 4E. For example, mutants in cluster 5 appear to have a tradeoff in a particular double drug condition (HRLF), and the pie charts confirm that they rarely originate from that evolution condition. This differs from cluster 4 mutants, which do not have a fitness tradeoff in HRLF, and are more likely to originate from that environment (see purple pie slice in figure 7). Additional cases where results of evolution experiments (pie charts) confirm observed fitness tradeoffs are discussed in the manuscript on lines 320 – 326, 594 – 598, 681 – 685.

      (3) Mutants from each cluster often fall into different genes: We sequenced many of these mutants and show that mutants in the same gene are often found in the same cluster. For example, all 3 IRA1 mutants are in cluster 6 (Fig 8), both GPB2 mutants are in cluster 4 (Figs 7 & 8), and 35/36 PDR mutants are in either cluster 2 or 3 (Figs 5 & 6). 

      (4) Mutants from each cluster have behaviors previously observed in the literature: We compared our sequencing results to the literature and found congruence. For example, PDR mutants are known to provide a fitness benefit in fluconazole and are found in clusters that have high fitness in fluconazole (lines 485 - 491). Previous work suggests that some mutations to PDR have different tradeoffs than others, which corresponds to our finding that PDR mutants fall into two separate clusters (lines 610 - 612). IRA1 mutants were previously observed to have high fitness in our “no drug” condition and are found in the cluster that has the highest fitness in the “no drug” condition (lines 691 - 696). Previous work even confirms the unusual fitness tradeoff we observe where IRA1 and other cluster 6 mutants have low fitness only in low concentrations of fluconazole (lines 702 - 704).

      (5) Mutants largely remain in their clusters when we use alternate clustering methods:  In our original manuscript, we performed various different re-clustering and/or normalization approaches on our data (Fig 6, S5, S7, S8, S10). The clusters of mutants that we observe in figure 4 do not change substantially when we re-cluster the data. In our revised manuscript, we added another clustering method: principal component analysis (PCA) (Fig S9).  Again, we found that our clusters are largely preserved.

      While these experiments demonstrate meaningful differences between the mutants in each cluster, important questions remain. For example, a long-standing question in biology centers on the extent to which every mutation has unique phenotypic effects versus the extent to which scientists can predict the effects of some mutations from other similar mutations. Additional studies on the clusters of mutants discovered here will be useful in deepening our understanding of this topic and more generally of the degree of pleiotropy in the genotype-phenotype map.

      Reviewer #2 (Public Review): 

      Summary: 

      Schmidlin & Apodaca et al. aim to distinguish mutants that resist drugs via different mechanisms by examining fitness tradeoffs across hundreds of fluconazole-resistant yeast strains. They barcoded a collection of fluconazole-resistant isolates and evolved them in different environments with a view to having relevance for evolutionary theory, medicine, and genotypephenotype mapping. 

      Strengths: 

      There are multiple strengths to this paper, the first of which is pointing out how much work has gone into it; the quality of the experiments (the thought process, the data, the figures) is excellent. Here, the authors seek to induce mutations in multiple environments, which is a really large-scale task. I particularly like the attention paid to isolates with are resistant to low concentrations of FLU. So often these are overlooked in favour of those conferring MIC values >64/128 etc. What was seen is different genotype and fitness profiles. I think there's a wealth of information here that will actually be of interest to more than just the fields mentioned (evolutionary medicine/theory). 

      We are grateful for this positive review. This was indeed a lot of work! We are happy that the reviewer noted what we feel is a unique strength of our manuscript: that we survey adaptive isolates across multiple environments, including low drug concentrations.  

      Weaknesses: 

      Not picking up low fitness lineages - which the authors discuss and provide a rationale as to why. I can completely see how this has occurred during this research, and whilst it is a shame I do not think this takes away from the findings of this paper. Maybe in the next one! 

      We thank the reviewer for these words of encouragement and will work towards catching more low fitness lineages in our next project.

      In the abstract the authors focus on 'tradeoffs' yet in the discussion they say the purpose of the study is to see how many different mechanisms of FLU resistance may exist (lines 679-680), followed up by "We distinguish mutants that likely act via different mechanisms by identifying those with different fitness tradeoffs across 12 environments". Whilst I do see their point, and this is entirely feasible, I would like a bit more explanation around this (perhaps in the intro) to help lay-readers make this jump. The remainder of my comments on 'weaknesses' are relatively fixable, I think: 

      We have expanded the introduction, in particular lines 129 – 157 of the revised manuscript, to walk readers through the connection between fitness tradeoffs and molecular mechanisms. For example, here is one relevant section of new text from lines 131 - 136: “The intuition here is as follows. If two groups of drug resistant mutants have different fitness tradeoffs, it could mean that they provide resistance through different underlying mechanisms. Alternatively, both could provide drug resistance via the same mechanism, but some mutations might also affect fitness via additional mechanisms (i.e. they might have unique “side-effects” at the molecular level) resulting in unique fitness tradeoffs in some environments.”

      In the introduction I struggle to see how this body of research fits in with the current literature, as the literature cited is a hodge-podge of bacterial and fungal evolution studies, which are very different! So example, the authors state "previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms" (lines 129-131) and then cite three papers, only one of which is a fungal research output. However, the next sentence focuses solely on literature from fungal research. Citing bacterial work as a foundation is fine, but as you're using yeast for this I think tailoring the introduction more to what is and isn't known in fungi would be more appropriate. It would also be great to then circle back around and mention monotherapy vs combination drug therapy for fungal infections as a rationale for this study. The study seems to be focused on FLU-resistant mutants, which is the first-line drug of choice, but many (yeast) infections have acquired resistance to this and combination therapy is the norm. 

      We ourselves are broadly interested in the structure of the genotype-phenotype-fitness map (PMID33263280, PMID32804946). For example, we are interested in whether diverse mutations converge at the level of phenotype and fitness. Figure 1A depicts a scenario with a lot of convergence in that all adaptive mutations have the same fitness tradeoffs.

      The reason we cite papers from yeast, as well as bacteria and cancer, is that we believe general conclusions about the structure of the genotype-phenotype-fitness map apply broadly. For example, the sentence the reviewer highlights, “previous work suggests that mutants with different fitness tradeoffs may affect fitness through different molecular mechanisms” is a general observation about the way genotype maps to fitness. So, we cited papers from across the tree of life to support this sentence.  And in the next sentence, where we cite 3 papers focusing solely on fungal research, we cite them because they are studies about the complexity of this map. Their conclusions, in theory, should also apply broadly, beyond yeast.

      On the other hand, because we study drug resistant mutations, we hope that our dataset and observations are of use to scientists studying the evolution of resistance. We use our introduction to explain how the structure of the genotype-phenotype-fitness map might influence whether a multidrug strategy is successful (Figure 1).

      We are hesitant to rework our introduction to focus more specifically on fungal infections as this is not our primary area of expertise.

      Methods: Line 769 - which yeast? I haven't even seen mention of which species is being used in this study; different yeast employ different mechanisms of adaptation for resistance, so could greatly impact the results seen. This could help with some background context if the species is mentioned (although I assume S. cerevisiae). 

      In the revised manuscript, we have edited several lines (line 95, 186, 822) to state the organism this work was done with is Saccharomyces cerevisiae. 

      In which case, should aneuploidy be considered as a mechanism? This is mentioned briefly on line 556, but with all the sequencing data acquired this could be checked quickly? 

      We like this idea and we are working on it, but it is not straightforward. The reviewer is correct in that we can use the sequencing data that we already have. But calling aneuploidy with certainty is tough because its signal can be masked by noise. In other words, some regions of the genome may be sequenced more than others by chance.

      Given this is not straightforward, at least not for us, this analysis will likely have to wait for a subsequent paper. 

      I think the authors could be bolder and try and link this to other (pathogenic) yeasts. What are the implications of this work on say, Candida infections? 

      Perhaps because our background lies in general study of the genotype-phenotype map, we are hesitant about making bold assertions about how our work might apply to pathogenic yeasts. We are hopeful that our work will serve as a stepping-stone such that scientists from that community can perhaps make (and test) such statements.   

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I found the ideas and the questions asked in this manuscript to be interesting and ambitious. The setup of the evolution and fitness competition experiments was well poised to answer them, but the analysis of the data is not currently enough to properly support the claims made. I would suggest revising the analysis to address the weaknesses raised in the public review and if possible, adding some more experimental validations. As you already have genome sequencing data showing the causal mutation for many mutants across the different clusters, it should be possible for you to reconstruct some of the strains and test validate their phenotypes and cluster identity. 

      Yes, this is possible. We added more validation experiments (see figure 5). We already had quite a few validation experiments (figures 5 - 8 and lines 479 - 718), but we did not clearly highlight the significance of these analyses in our original manuscript. Therefore, we modified the text in our revised manuscript in various places to do so. For example, we now make clearer that we jointly use BIC scores as well as validation experiments to decide how many clusters to describe (lines 436 - 446). We also make clearer that our clustering analysis is only the first step towards identifying groups of mutants with similar tradeoffs by using words and phrases like, “we start by” (line 411) and “preliminarily” (line 448) when discussing the clustering analysis.  We also point readers to all the figures describing our validation experiments earlier (line 443), and list these experiments out in the discussion (lines 738 - 741).

      Also, please deposit your genome sequencing data in a public database (I am not sure I saw it mentioned anywhere). 

      We have updated line 1088 of the methods section to include this sentence: “Whole genome sequences were deposited in GenBank under SRA reference PRJNA1023288.”

      Reviewer #2 (Recommendations For The Authors):

      I don't think the figures or experiments can be improved upon, they are excellent. There are a few times I feel things are written in a rather confusing way and could be explained better, but also I feel there are places the authors jump from one thing to another really quickly and the reader (who might not be an expert in this area) will struggle to keep up. For example: 

      Explaining what RAD is - it is introduced in the methods, but what it is, is not really explained. 

      Since the introduction is already very long, we chose not to explain radicicol’s mechanism of action here. Instead, we bring this up later on lines 614 – 621 when it becomes relevant.

      More generally, in response to this advice and that from reviewer 1, we also added text to various places in the manuscript to help explain our work more clearly. In particular, we clarified the significance of our validation experiments and various important methodological details (see above). We also better explained the connection between fitness tradeoffs and mechanisms (see above) and added more details about the potential use cases of our approach (lines 142 – 150).

      The abstract states "some of the groupings we find are surprising. For example, we find some mutants that resist single drugs do not resist their combination, and some mutants to the same gene have different tradeoffs than others". Firstly, this sentence is a bit confusing to read but if I've read it as intended, then is it really surprising? It's difficult for organisms (bacteria and fungi) to develop multiple beneficial mutations conferring drug resistance on the same background, hence why combination antifungal drug therapy is often used to treat infections. 

      This is a place where brevity got in the way of clarity. We added a bit of text to make clear why we were surprised. Specifically, we were surprised because not all mutants behave the same. Some resist single drugs AND their combination. Some resist single drugs but not their combination. The sentence in the abstract now reads, “For example, we find some mutants that resist single drugs do not resist their combination, while others do. And some mutants to the same gene have different tradeoffs than others.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Responses to recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors):

      The manuscript would be strengthened with the following key revisions mostly having to do with image quality: 

      (1) It is very difficult in Figure 4B to see which nuclei actually have evidence of mitochondrial transcripts. It might be helpful to provide arrows to specific cells and also to provide some estimate of the percentage of cells with nuclear mt-transcripts as measured by ISH compared to the 3-6% of cortex cell estimate seen in the snRNAseq analysis. 

      As suggested, now we have added arrows to help readers to see the signals in nuclei. The detection threshold of ISH and single-nucleus RNA-seq should be different, and therefore, measuring estimates of PT-Mito by ISH would not be reliable.

      (2) The phospho-PKR images provided as evidence of C16 activity (Supplemental Figure 1) are too dim to be very useful. Could brighter images be provided? 

      We have now adjusted the LUTs of images in Supplemental Figure 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This study is convincing because they performed time-resolved X-ray crystallography under different pH conditions using active/inactive metal ions and PpoI mutants, as with the activity measurements in solution in conventional enzymatic studies. Although the reaction mechanism is simple and may be a little predictable, the strength of this study is that they were able to validate that PpoI catalyzes DNA hydrolysis through "a single divalent cation" because time-resolved X-ray study often observes transient metal ions which are important for catalysis but are not predictable in previous studies with static structures such as enzyme-substrate analog-metal ion complexes. The discussion of this study is well supported by their data. This study visualized the catalytic process and mutational effects on catalysis, providing new insight into the catalytic mechanism of I-PpoI through a single divalent cation. The authors found that His98, a candidate of proton acceptor in the previous experiments, also affects the Mg2+ binding for catalysis without the direct interaction between His98 and the Mg2+ ion, suggesting that "Without a proper proton acceptor, the metal ion may be prone for dissociation without the reaction proceeding, and thus stable Mg2+ binding was not observed in crystallo without His98". In future, this interesting feature observed in I-PpoI should be investigated by biochemical, structural, and computational analyses using other metal-ion dependent nucleases. 

      We appreciate the reviewer for the positive assessment as well as all the comments and suggestions.

      Reviewer #2 (Public Review): 

      Summary: 

      Most polymerases and nucleases use two or three divalent metal ions in their catalytic functions. The family of His-Me nucleases, however, use only one divalent metal ion, along with a conserved histidine, to catalyze DNA hydrolysis. The mechanism has been studied previously but, according to the authors, it remained unclear. By use of a time resolved X-ray crystallography, this work convincingly demonstrated that only one M2+ ion is involved in the catalysis of the His-Me I-PpoI 19 nuclease, and proposed concerted functions of the metal and the histidine. 

      Strengths: 

      This work performs mechanistic studies, including the number and roles of metal ion, pH dependence, and activation mechanism, all by structural analyses, coupled with some kinetics and mutagenesis. Overall, it is a highly rigorous work. This approach was first developed in Science (2016) for a DNA polymerase, in which Yang Cao was the first author. It has subsequently been applied to just 5 to 10 enzymes by different labs, mainly to clarify two versus three metal ion mechanisms. The present study is the first one to demonstrate a single metal ion mechanism by this approach. 

      Furthermore, on the basis of the quantitative correlation between the fraction of metal ion binding and the formation of product, as well as the pH dependence, and the data from site-specific mutants, the authors concluded that the functions of Mg2+ and His are a concerted process. A detailed mechanism is proposed in Figure 6. 

      Even though there are no major surprises in the results and conclusions, the time-resolved structural approach and the overall quality of the results represent a significant step forward for the Me-His family of nucleases. In addition, since the mechanism is unique among different classes of nucleases and polymerases, the work should be of interest to readers in DNA enzymology, or even mechanistic enzymology in general. 

      Thank you very much for your comments and suggestions.

      Weaknesses: 

      Two relatively minor issues are raised here for consideration: 

      p. 4, last para, lines 1-2: "we next visualized the entire reaction process by soaking I-PpoI crystals in buffer....". This is a little over-stated. The structures being observed are not reaction intermediates. They are mixtures of substrates and products in the enzyme-bound state. The progress of the reaction is limited by the progress of the soaking of the metal ion. Crystallography has just been used as a tool to monitor the reaction (and provide structural information about the product). It would be more accurate to say that "we next monitored the reaction progress by soaking....". 

      We appreciate the clarification regarding the description of our experimental approach. We agree that our structures do not represent reaction intermediates but rather mixtures of substrate and product states within the enzyme-bound environment. We have revised the text accordingly to more accurately reflect our methodology.

      p. 5, the beginning of the section. The authors on one hand emphasized the quantitative correlation between Mg ion density and the product density. On the other hand, they raised the uncertainty in the quantitation of Mg2+ density versus Na+ density, thus they repeated the study with Mn2+ which has distinct anomalous signals. This is a very good approach. However, there is still no metal ion density shown in the key Figure 2A. It will be clearer to show the progress of metal ion density in a figure (in addition to just plots), whether it is Mg or Mn. 

      Thank you for your insightful comments. We recognize the importance of visualizing metal ion density alongside product density data. To address this, we included in Figure S4 to present Mg2+/Mn2+ and product densities concurrently.

      Reviewer #1 (Recommendations For The Authors): 

      (1) Figure 6. I understand that pre-reaction state (left panel) and Metal-binding state (two middle panels) are in equilibrium. But can we state that the Metal-binding state (two middle panels) and the product state (right panel) are in equilibrium and connected by two arrows? 

      Thank you for your comments. We agree that the DNA hydrolysis reaction process may not be reversible within I-Ppo1 active site. To clarify, we removed the backward arrows between the metal-binding state and product state. In addition, we thank the reviewer for giving a name for the middle state and think it would be better to label the middle state. We added the metal-binding state label in the revised Figure 6 and also added “on the other hand, optimal alignment of a deprotonated water and Mg2+ within the active site, labeled as metal-binding state, leads to irreversible bond breakage (Fig. 6a)” within the text.

      (2) The section on DNA hydrolysis assay (Materials and Methods) is not well described. In this section, the authors should summarize the methods for the experiments in Figure 4 AC, Figure 5BC, Figure S3C, Figure S4EF, and Figure S6AB. The authors presented some graphs for the reactions. For clarity, the author should state in the legends which experiments the results are from (in crystallo or in solution). Please check and modify them. 

      Thank you for the suggestion. We have added four paragraphs to detail the experimental procedures for experiments in these figures. In addition, we have checked all of the figure legends and labeled them as “in crystallo or in solution.” To clarify, we also added “in crystallo” or “solution” in the corresponding panels.

      (3) The authors showed the anomalous signals of Mn2+ and Tl+. The authors should mention which wavelength of X-rays was used in the data collections to calculate the anomalous signals. 

      Thank you for the suggestion. We have included the wavelength of the X-ray in the figure legends that include anomalous maps, which were all determined at an X-ray wavelength of 0.9765 Å.

      (4) The full names of "His-Me" and "HNH" are necessary for a wide range of readers. 

      Thank you for the suggestion. We have included the full nomenclature for His-Me (histidine-metal) nucleases and HNH (histidine-asparagine-histidine) nuclease.

      (5) The authors should add the side chain of Arg61 in Figure 1E because it is mentioned in the main text. 

      Thank you for the suggestion. We have added Arg61 to Figure 1E.

      (6) Figure 5D. For clarity, the electron densities should cover the Na+ ion. The same request applies to WatN in Figure S3B.

      Thank you for catching this detail. We have added the electron density for the Na+ ion in Figure 5D and WatN in Figure S3B.

      (7) At line 269 on page 8, what is "previous H98A I-PpoI structure with Mn2+"? Is the structure 1CYQ? If so, it is a complex with Mg2+. 

      Thank you for catching this detail. We have edited the text to “previous H98A I-PpoI structure with Mg2+.”

      (8) At line 294 on page 9, "and substrate alignment or rotation in MutT (66)." I think "alignment of the substrate and nucleophilic water" is preferred rather than "substrate alignment or rotation". 

      Thank you for the suggestion. We have edited the text to “alignment of the substrate and nucleophilic water.”

      (9) At line 305 on page 9, "Second, (58, 69-71) single metal ion binding is strictly correlated with product formation in all conditions, at different pH and with different mutants (Figure 3a and Supplementary Figure 4a-c) (58)". The references should be cited in the correct positions. 

      Thank you for catching this typo. We have removed the references.

      (10) At line 347 on page 10, "Grown in a buffer that contained (50 g/L glucose, 200 g/L α-lactose, 10% glycerol) for 24 hrs." Is this sentence correct? 

      Thank you for catching this detail. We have corrected the sentence.

      (11) At line 395 on page 11, "The His98Ala I-PpoI crystals of first transferred and incubated in a pre-reaction buffer containing 0.1M MES (pH 6.0), 0.2 M NaCl, 1 mM MgCl2 or MnCl2, and 20% (w/v) PEG3350 for 30 min." In the experiments using this mutant, does a pre-reaction buffer contain MgCl2 or MnCl2? 

      Thank you for bringing this to our attention. We have performed two sets of experiments: 1) metal ion soaking in 1 mM Mn2+, which is performed similarly as WT and does not have Mn2+ in the pre-reaction buffer; 2) imidazole soaking, 1 mM Mn2+ was included in the pre-reaction buffer. We reasoned that the Mn2+ will not bind or promote reaction with His98Ala I-PpoI, but pre-incubation may help populate Mn2+ within the lattice for better imidazole binding. However, neither Mn2+ nor imidazole were observed. We have added experimental details for both experiments with His98Ala I-PpoI.

      (12) In the figure legends of Figure 1, is the Fo-Fc omit map shown in yellow not in green? Please remove (F) in the legends. 

      We have changed the Fo-Fc map to be shown in violet. We have also removed (f) from the figure legends.

      (13) I found descriptions of "MgCl". Please modify them to "MgCl2". 

      Thank you for catching these details. We have modified all “MgCl” to “MgCl2.”

      (14) References 72 and 73 are duplicated. 

      We have removed the duplicated reference.

      Reviewer #2 (Recommendations For The Authors): 

      p. 9, first paragraph, last three lines: "Thus, we suspect that the metal ion may play a crucial role in the chemistry step to stabilize the transition state and reduce the electronegative buildup of DNA, similar to the third metal ion in DNA polymerases and RNaseH." This point is significant but the statement seems a little uncertain. You are saying that the single metal plays the role of two metals in polymerase, in both the ground state and the transition state. I believe the sentence can be stronger and more explicit. 

      Thank you for raising this point. We suspect the single metal ion in I-PpoI is different from the A-site or B-site metal ion in DNA polymerases and RNaseH, but similar to the third metal ion in DNA polymerases and nucleases. As we stated in the text,

      (1) the metal ion in I-PpoI is not required for substrate alignment. The water molecule and substrate can be observed in place even in the presence of the metal ion. In contrast, the A-site or B-site metal ion in DNA polymerases and RNaseH are required for aligning the substrates.

      (2) Moreover, the appearance of the metal ion is strictly correlated with product formation, similar as the third metal ion in DNA polymerase and RNaseH.

      To emphasize our point, we have revised the sentence as

      “Thus, similar to the third metal ion in DNA polymerases and RNaseH, the metal ion in I-PpoI is not required for substrate alignment but is essential for catalysis. We suspect that the single metal ion helps stabilize the transition state and reduce the electronegative buildup of DNA, thereby promoting DNA hydrolysis.”

      Minor typos: 

      p. 2, line 4 from bottom: due to the relatively low resolution... 

      Thank you for catching this. We have edited the text to “due to the relatively low resolution.”

      Figure 4F: What is represented by the pink color? 

      The structures are color-coded as 320 s at pH 6 (violet), 160 s at pH 7 (yellow), and 20 s at pH 8 (green). We have included the color information in figure legend and make the labeling clearer in the panel.

      p. 9, first paragraph, last line: ...similar to the third... 

      Thank you for catching this. We have edited the text.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The study answers the important question of whether the conformational dynamics of proteins are slaved by the motion of solvent water or are intrinsic to the polypeptide. The results from neutron scattering experiments, involving isotopic labelling, carried out on a set of four structurally different proteins are convincing, showing that protein motions are not coupled to the solvent. A strength of this work is the study of a set of proteins using spectroscopy covering a range of resolutions. A minor weakness is the limited description of computational methods and analysis of data. The work is of broad interest to researchers in the fields of protein biophysics and biochemistry.

      We thank the editors and reviewers for the positive and encouraging comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zheng et al. study the 'glass' transitions that occurs in proteins at ca. 200K using neutron diffraction and differential isotopic labeling (hydrogen/deuterium) of the protein and solvent. To overcome limitations in previous studies, this work is conducted in parallel with 4 proteins (myoglobin, cytochrome P450, lysozyme and green fluorescent protein) and experiments were performed at a range of instrument time resolutions (1ns - 10ps). The author's data looks compelling, and suggests that transitions in the protein and solvent behavior are not coupled and contrary to some previous reports, the apparent water transition temperature is a 'resolution effect'; i.e. instrument response limited. This is likely to be important in the field, as a reassessment of solvent 'slaving' and the role of the hydration shell on protein dynamics should be reassessed in light of these findings.

      Strengths:

      The use of multiple proteins and instruments with a rate of energy resolution/ timescales.

      We thank the reviewer for highlighting our key findings.

      Weaknesses:

      The paper could be organised to better allow the comparison of the complete dataset collected. The extent of hydration clearly influences the protein transition temperature. The authors suggest that "water can be considered here as lubricant or plasticizer which facilitates the motion of the biomolecule." This may be the case, but the extent of hydration may also alter the protein structure.

      Following the reviewer’s suggestion, we studied the secondary structure content and tertiary structure of CYP protein at different hydration levels (h = 0.2 and 0.4) through molecular dynamics simulation. As shown in Table S2 and Fig. S6, the extent of hydration does not alter the protein secondary structure content and overall packing. Thus, this result also suggests that water molecules have more influence on protein dynamics than on protein structure. We added the above results in the revised SI.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript entitled "Decoupling of the Onset of Anharmonicity between a Protein and Its Surface Water around 200 K" by Zheng et al. presents a neutron scattering study trying to elucidate if at the dynamical transition temperature water and protein motions are coupled. The origin of the dynamical transition temperature is highly debated since decades and specifically its relation to hydration.

      Strengths:

      The study is rather well conducted, with a lot of efforts to acquire the perdeuterated proteins, and some results are interesting.

      We thank the reviewer for highlighting our key findings.

      Weaknesses:

      The MD data presented appears to be missing description of the methods used.

      If these data support the authors claim that different levels of hydration do not affect the protein structure, careful analysis of the MD simulation data should be presented that show the systems are properly equilibrated under each condition. Additionally, methods are needed to describe the MD parameters and methods used, and for how long the simulations were run.

      We have now added the methods of MD simulation into the revised SI.

      “The initial structure of protein cytochrome P450 (CYP) for simulations was taken from PDB crystal structure (2ZAX). Two protein monomers were filled in a cubic box. 1013 and 2025 water molecules were inserted into the box randomly to reach a mass ratio of 0.2 and 0.4 gram water/1 gram protein, respectively, which mimics the experimental condition. Then 34 sodium counter ions were added to keep the system neutral in charge. The CHARMM 27 force field in the GROMACS package was used for CYP, whereas the TIP4P/Ew model was chosen for water. The simulations were carried out at a broad range of temperatures from 360 K to 100 K, with a step of 5 K. At each temperature, after the 5000 steps energy-minimization procedure, a 10 ns NVT is conducted. After that, a 30 ns NPT simulation was carried out at 1 atm with the proper periodic boundary condition. As shown in Fig. S7, 30 ns is sufficient to equilibrate the system. The temperature and pressure of the system is controlled by the velocity rescaling method and the method by Parrinello and Rahman, respectively. All bonds of water in all the simulations were constrained with the LINCS algorithm to maintain their equilibration length. In all the simulations, the system was propagated using the leap-frog integration algorithm with a time step of 2 fs. The electrostatic interactions were calculated using the Particle Mesh Ewalds (PME) method. A non-bond pair-list cutoff of 1 nm was used and the pair-list was updated every 20 fs. All MD simulations were performed using GROMACS 4.5.1 software packages.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Response to author's changes:

      See public review: The MD data presented appears to be missing description of the methods used.

      If these data support the authors claim that different levels of hydration do not affect the protein structure, careful analysis of the MD simulation data should be presented that show the systems are properly equilibrated under each condition. Additionally, methods are needed to describe the MD parameters and methods used, and for how long the simulations were run.

      We have now added the methods of MD simulation into the revised SI. Please see Reply 5.

      Reviewer #2 (Recommendations For The Authors):

      The authors answered my questions and substantially improved the manuscript.

      We thank the reviewer for the encouraging comments .

    1. Author response:

      'We thank the reviewers for their helpful comments and criticisms of our manuscript and are pleased by the overall positive nature of the comments. For the eLife Version of Record, we plan to carry out the following experiments to address reviewer comments:

      - We will use genetic approaches (e.g., driving p35 in glia to block apoptosis) and molecular markers, such as phospho-Histone H3, to assess whether reduced glial proliferation or increased glial apoptosis contributes to reduced glial cell number.

      - We will assess the ability of glial-specific expression of the Drosophila or Human ifc/DEGS1 transgenes to rescue the ifc lethal phenotype to adulthood.

      - We will replicate key phenotypic findings with additional ifc alleles.

      - We will enhance our characterization of 3xP3 RFP transgenes with respect to glial subtypes both for the insert we used in our study and at least one independent insert.

      - We will edit the text of the manuscript to clarify additional points raised by the reviewers.

      Once we complete the above approaches, we will modify our manuscript accordingly and submit a full response to the reviews to eLife along with the revised manuscript,'

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to consider the effects of phonotactics on the effectiveness of memory reactivation during sleep. They have created artificial words that are either typical or atypical and showed that reactivation improves memory for the latter but not the former.

      Comment 1:

      Strengths:

      This is an interesting design and a creative way of manipulating memory strength and typicality. In addition, the spectral analysis on both the wakefulness data and the sleep data is well done. The article is clearly written and provides a relevant and comprehensive of the literature and of how the results contribute to it.

      We thank the reviewer for his/her positive evaluation of our manuscript. 

      Comment 2:

      Weaknesses:

      (1) Unlike most research involving artificial language or language in general, the task engaged in this manuscript did not require (or test) learning of meaning or translation. Instead, the artificial words were arbitrarily categorised and memory was tested for that categorisation. This somewhat limits the interpretation of the results as they pertain to language science, and qualifies comparisons with other language-related sleep studies that the manuscript builds on.

      We thank the reviewer for this comment. We agree that we did not test for meaning or translation but used a categorization task in which we trained subjects to discriminate artificial words according to their reward associations (rewarded vs. non-rewarded). Previous language studies (Batterink et al., 2014; Batterink and Paller, 2017; Reber, 1967) used artificial words to investigate implicit learning of hidden grammar rules. Here, the language researchers studied generalization of the previously learned grammar knowledge by testing subject’s ability to categorize correctly a novel set of artificial words into rule-congruent versus rule-incongruent words. These differences to our study design might limit the comparability between the results of previous language studies of artificial grammar learning and our findings. We discussed now this aspect as a limitation of our novel paradigm. 

      We added the following sentences to the discussion on p.14, ll. 481-488:

      Based on our paradigm, we investigated categorization learning of artificial words according to their reward associations (rewarded vs. unrewarded) and did not studied aspects of generalization learning of artificial grammar rules (Batterink et al., 2014; Batterink and Paller, 2017; Reber, 1967). This difference might limit the comparability between these previous language-related studies and our findings. However, the usage of artificial words with distinct phonotactical properties provided a successful way to manipulate learning difficulty and to investigate word properties on TMR, whereas our reward categorization learning paradigm had the advantage to increase the relevance of the word learnings due to incentives.    

      Comment 3:

      (2) The details of the behavioural task are hard to understand as described in the manuscript. Specifically, I wasn't able to understand when words were to be responded to with the left or right button. What were the instructions? Were half of the words randomly paired with left and half with right and then half of each rewarded and half unrewarded? Or was the task to know if a word was rewarded or not and right/left responses reflected the participants' guesses as to the reward (yes/no)? Please explain this fully in the methods, but also briefly in the caption to Figure 1 (e.g., panel C) and in the Results section.

      We thank the reviewer for this comment and added additional sentences into the document to provide additional explanations. We instructed the participants to respond to each word by left- and right-hand button presses, whereas one button means the word is rewarded and the other button means the word is unrewarded. The assignment of left- and right-hand button presses to their meanings (rewarded versus unrewarded) differed across subjects. In the beginning, they had to guess. Then over trial repetitions with feedback at the end of each trial, they learned to respond correctly according to the rewarded/unrewarded associations of the words.        

      We added the following sentences to the results section on p.5, ll. 161-168: 

      As a two alternative forced-choice task, we assigned left- and right-hand button presses to the rewarded and the unrewarded word category, counterbalanced across subjects. We instructed the participants to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points). In the beginning, they had to guess. By three presentations of each word in randomized order and by feedback at the end of each trial, they learned to respond correctly according to the rewarded/unrewarded associations of the words (Fig. 1c). 

      We added the following sentences to the caption of Figure 1 on p.6, ll. 188-194:

      As a two alternative forced-choice task, responses of left- and right-hand button presses were assigned to the rewarded and the unrewarded word category, respectively. The participants were instructed to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points). d) Feedback matrix with the four answer types (hits: rewarded and correct; CR, correct rejections: unrewarded and correct; misses: rewarded and incorrect; FA, false alarms: unrewarded and incorrect) regarding to response and reward assignment of the word.

      We added the following sentences to the methods on p.19, ll. 687-692:  

      As a two alternative forced-choice task, we assigned left- and right-hand button presses to the rewarded and the unrewarded word category, counterbalanced across subjects. We instructed the participants to respond to each word by left- or right-hand button presses, whereas one button means the word is rewarded (gain of money points) and the other button means the word is unrewarded (avoid the loss of money points).

      Comment 4:  

      (3) Relatedly, it is unclear how reward or lack thereof would translate cleanly into a categorisation of hits/misses/correct rejections/false alarms, as explained in the text and shown in Figure 1D. If the item was of the non-rewarded class and the participant got it correct, they avoided loss. Why would that be considered a correct rejection, as the text suggests? It is no less of a hit than the rewarded-correct, it's just the trial was set up in a way that limits gains. This seems to mix together signal detection nomenclature (in which reward is uniform and there are two options, one of which is correct and one isn't) and loss-aversion types of studies (in which reward is different for two types of stimuli, but for each type you can have H/M/CR/FA separably). Again, it might all stem from me not understanding the task, but at the very least this required extended explanations. Once the authors address this, they should also update Fig 1D. This complexity makes the results relatively hard to interpret and the merit of the manuscript hard to access. Unless there are strong hypotheses about reward's impact on memory (which, as far as I can see, are not at the core of the paper), there should be no difference in the manner in which the currently labelled "hits" and "CR" are deemed - both are correct memories. Treating them differently may have implications on the d', which is the main memory measure in the paper, and possibly on measures of decision bias that are used as well.

      We thank the reviewer for this comment giving us the opportunity to clarify. As explained in the previous comment, for our two alternative forced-choice task, we instructed the participants to press one button when they were thinking the presented word is rewarded and the other button, when they were thinking the word is unrewarded. Based on this instruction, we applied the signal detection theory (SDT), because the subjects had the task to detect when reward was present or to reject when reward was absent. Therefore, we considered correct responses of words of the rewarded category as hits and words of the unrewarded category as correct rejections (see Table below). However, the reviewer is correct because in addition to false alarms, we punished here the incorrect responses by subtraction of money points to control for alternative task strategies of the participants instead of reward association learning of words. We agree that further explanation/argumentation to introduce our nomenclature is necessary.  

      Author response table 1.

      We adjusted the results section on p.5, ll. 169-177:

      To obtain a measurement of discrimination memory with respect to the potential influence of the response bias, we applied the signal detection theory (Green and Swets, 1966). Because, we instructed the participants to respond to each word by left- or right-hand button presses and that one button means reward is present whereas the other button means reward is absent, we considered correct responses of words of the rewarded category as hits and words of the unrewarded category as correct rejections. Accordingly, we assigned the responses with regard to the reward associations of the words to the following four response types: hits (rewarded, correct); correct rejections (unrewarded, correct); misses (rewarded, incorrect); and false alarms (unrewarded, incorrect). Dependent on responses, subjects received money points (Fig. 1d). 

      Comment 5:

      (4) The study starts off with a sample size of N=39 but excludes 17 participants for some crucial analyses. This is a high number, and it's not entirely clear from the text whether exclusion criteria were pre-registered or decided upon before looking at the data. Having said that, some criteria seem very reasonable (e.g., excluding participants who were not fully exposed to words during sleep). It would still be helpful to see that the trend remains when including all participants who had sufficient exposure during sleep. Also, please carefully mention for each analysis what the N was.

      Our study was not pre-registered. Including all the subjects independent of low prememory performance, but with respect to a decent number of reactivations (> 160 reactivations, every word at least 2 times), resulted in a new dataset with 15 and 13 participants of the high- and low-PP cueing condition, respectively. Here, statistical analyses revealed no significant overnight change anymore in memory performance in the high-PP cueing condition (Δ memory (d'): t(14) = 1.67, p = 0.12), whereas the increase of the bias in decision making towards risk avoidance still remained significant (Δ bias (c-criterion): t(14) = 3.36, p = 0.005).

      We modified and added the following sentences to the discussion on p.13, ll. 456-458:

      Our study has limitations due to a small sample size and between-subject comparisons. The criteria of data analyses were not pre-registered and the p-values of our behavior analyses were not corrected for multiple comparisons.

      Comment 6:             

      (5) Relatedly, the final N is low for a between-subjects study (N=11 per group). This is adequately mentioned as a limitation, but since it does qualify the results, it seemed important to mention it in the public review.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. Accordingly, we now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.        

      We added the following sentences to the discussion about the limitations on p.14, ll. 465-488: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 7:

      (6) The linguistic statistics used for establishing the artificial words are all based on American English, and are therefore in misalignment with the spoken language of the participants (which was German). The authors should address this limitation and discuss possible differences between the languages. Also, if the authors checked whether participants were fluent in English they should report these results and possibly consider them in their analyses. In all fairness, the behavioural effects presented in Figure 2A are convincing, providing a valuable manipulation test.

      We thank the reviewer pointing to the misalignment between the German-speaking participants and the used artificial words based on American English. Further, we did not assessed the English language capability of the participants to control it as a potential confounder, whereas comparative control analyses revealed no significant differences between the both cueing groups in pre-sleep memory performance (see Table S1). 

      We now discussed these comments as limitations on p.14, ll. 473-481: 

      Further, we used artificial words based on American English in combination with German speaking participants, whereas language differences of pronunciation and phoneme structures might affect word perception and memory processing (Bohn and Best, 2012). On the other hand, both languages are considered to have the same language family (Eberhard et al., 2019) and the phonological distance between English and German is quite short compared for example to Korean (Luef and Resnik, 2023). Thus, major common phonological characteristics across both languages are still preserved. In addition, our behavior analyses revealed robust word discrimination learning and distinct memory performance according to different levels of phonotactic probabilities providing evidence of successful experimental manipulation. 

      Comment 8:

      (7) With regard to the higher probability of nested spindles for the high- vs low-PP cueing conditions, the authors should try and explore whether what the results show is a general increase for spindles altogether (as has been reported in the past to be correlated with TMR benefit and sleep more generally) or a specific increase in nested spindles (with no significant change in the absolute numbers of post-cue spindles). In both cases, the results would be interesting, but differentiating the two is necessary in order to make the claim that nesting is what increased rather than spindle density altogether, regardless of the SW phase.

      We conducted additional analyses based on detected sleep spindles to provide additional data according to this question. 

      We added the following section to the supplementary data on pp. 31-32, ll. 1007-1045:  

      After conducting a sleep spindle detection (frequency range of 12-16Hz, see methods for details), we compared the sleep spindle density between the TMR conditions of high- and lowPP showing no significant difference (see Fig. S8a and Table S9). Next, we subdivided the detected sleep spindles into coupled and uncoupled sleep spindles with the previously detected slow waves (SW; analyses of Fig. 4). Sleep spindles were defined as coupled when their amplitude peak occurred during the SW up-state phase (0.3 to 0.8s time-locked to the SW troughs). A two-way mixed design ANOVA on the amplitude size of the sleep spindles with the cueing group as a between-subject factor (high-PP-cued vs. low-PP-cued) and SW-coupling as a within-subject factor (coupled vs. uncoupled) showed a significant interaction effect (cueing group × SW-coupling: F(1,20) = 4.51, p = 0.046, η2 = 0.18), a significant main effect of SW-coupling (F(1,20) = 85.02, p < 0.001, η2 = 0.81), and a trend of significance of the main effect of the cueing group (F(1,20) = 3.54, p = 0.08). Post-hoc unpaired t-tests revealed a significant higher amplitude size of the coupled sleep spindles of the cueing group of high- compared to low-PP (t(20) = 2.13, p = 0.046, Cohen’s d = 0.91; Fig. S8b) and no significant group difference of the uncoupled sleep spindles (t(20) = 1.62, p = 0.12). An additional comparison of the amount of coupled sleep spindles between the cueing groups revealed no significant difference (see Table S9). 

      Here, we found that detected sleep spindles coupled to the SW up-state phase occurred with higher amplitude after TMR presentations of the high-PP words in comparison to the low-PP words, whereas the sleep spindle density and the amount of sleep spindles coupled to the SW up-state phase did not differed between the cueing conditions.     

      We added the following sentences to the methods on pp. 22-23, ll. 822-839:  

      Sleep spindle analyses 

      We detected fast sleep spindles by band-pass filtering (12-16Hz) the signal of the Pz electrode during the auditory cueing trials in the time windows of -2 to 8s according to stimulus onsets. The amplitude threshold was calculated individually for each subject as 1.25 standard deviations (SDs) from the mean. The beginning and end times of the sleep spindles were then defined as the points at which the amplitude fell below 0.75 SDs before and after the detected sleep spindle. Only sleep spindles with a duration of 0.5-3 s were included in subsequent analyses. 

      To compare the sleep spindle densities between the different cueing conditions of high- and low-PP, we computed the grand average sleep spindle density distribution in number per trial with a bin size of 0.5s from -0.5 to 6s time-locked to stimulus onset in each condition (see Fig. S8a and Table S9).     

      Based on the detected slow waves and sleep spindles, we defined coupling events when the positive amplitude peak of a detected sleep spindle was occurring during the slow wave upstate phase in a time window of 0.3 to 0.8s according to the trough of a slow wave. 

      We computed the averaged amplitude size of each detected sleep spindle by calculating the mean of the absolute amplitude values of all negative and positive peaks within a detected sleep spindle (see Fig. S8b).

      We added the following sentences to the results on p.10, ll. 338-343:  

      By conducting an additional analyses based on detection of fast sleep spindles (12-16Hz; see methods), we confirmed that fast sleep spindles during the SW up-states (from 0.3 to 0.8s after the SW trough) occurred with significantly higher amplitude after the cueing presentation of high- compared to low-PP words, whereas parameters of sleep spindle density and the amount sleep spindles coupled to the SW up-state did not differed between the cueing conditions (see Fig. S8 and Table S9).       

      Reviewer #2 (Public Review):

      Summary:

      The work by Klaassen & Rasch investigates the influence of word learning difficulty on sleepassociated consolidation and reactivation. They elicited reactivation during sleep by applying targeted memory reactivation (TMR) and manipulated word learning difficulty by creating words more similar (easy) or more dissimilar (difficult) to our language. In one group of participants, they applied TMR of easy words and in another group of participants, they applied TMR of difficult words (between-subjects design). They showed that TMR leads to higher memory benefits in the easy compared to the difficult word group. On a neural level, they showed an increase in spindle power (in the up-state of an evoked response) when easy words were presented during sleep.

      Comment 9:

      Strengths:

      The authors investigate a research question relevant to the field, that is, which experiences are actually consolidated during sleep. To address this question, they developed an innovative task and manipulated difficulty in an elegant way.

      Overall, the paper is clearly structured, and results and methods are described in an understandable way. The analysis approach is solid.

      We thank the reviewer for his/her positive evaluation of our manuscript.

      Weaknesses:

      Comment 10:

      (1) Sample size

      For a between-subjects design, the sample size is too small (N = 22). The main finding (also found in the title "Difficulty in artificial word learning impacts targeted memory reactivation") is based on an independent samples t-test with 11 participants/group.

      The authors explicitly mention the small sample size and the between-subjects design as a limitation in their discussion. Nevertheless, making meaningful inferences based on studies with such a small sample size is difficult, if not impossible.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. Accordingly, we now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.        

      We added the following sentences to the discussion about the limitations on p.14, ll. 465-473: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table

      S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 11:

      (2) Choice of task

      though the task itself is innovative, there would have been tasks better suited to address the research question. The main disadvantage the task and the operationalisation of memory performance (d') have is that single-trial performance cannot be calculated. Consequently, choosing individual items for TMR is not possible.

      Additionally, TMR of low vs. high difficulty is conducted between subjects (and independently of pre-sleep memory performance) which is a consequence of the task design.

      The motivation for why this task has been used is missing in the paper.

      We used a reward task combined with TMR because previous studies revealed beneficial effects of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021). In addition, we wanted to increase the motivation of the participants, as they could receive additional monetary compensation according to their learning and memory task performances. Furthermore, we designed the task, with the overall possibility to translate this task to operant conditioning in rats (see research proposal: https://data.snf.ch/grants/grant/168602). However, the task turned out to be too difficult to translate to rats, whereas we developed a different learning paradigm for the animal study (Klaassen et al., 2021) of this cross-species research project.       

      We added the following sentence to the introduction on p.4, ll. 134-137:

      To consider the beneficial effect of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021), we trained healthy young participants to categorize these words into rewarded and unrewarded words to gain and to avoid losses of money points.  

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors investigated the effects of targeted memory reactivation (TMR) during sleep on memory retention for artificial words with varying levels of phonotactical similarity to real words. The authors report that the high phonotactic probability (PP) words showed a more pronounced EEG alpha decrease during encoding and were more easily learned than the low PP words. Following TMR during sleep, participants who had been cued with the high PP TMR, remembered those words better than 0, whilst no such difference was found in the other conditions. Accordingly, the authors report higher EEG spindle band power during slow-wave up-states for the high PP as compared to low PP TMR trials. Overall, the authors conclude that artificial words that are easier to learn, benefit more from TMR than those which are difficult to learn.

      Comment 12 & 13:

      Strengths:

      (1) The authors have carefully designed the artificial stimuli to investigate the effectiveness of TMR on words that are easy to learn and difficult to learn due to their levels of similarity with prior wordsound knowledge. Their approach of varying the level of phonotactic probability enables them to have better control over phonotactical familiarity than in a natural language and are thus able to disentangle which properties of word learning contribute to TMR success.

      (2) The use of EEG during wakeful encoding and sleep TMR sheds new light on the neural correlates of high PP vs. low PP both during wakeful encoding and cue-induced retrieval during sleep.

      We thank the reviewer for his/her positive evaluation of our manuscript.

      Weaknesses:

      Comment 14:

      (1) The present analyses are based on a small sample and comparisons between participants. Considering that the TMR benefits are based on changes in memory categorization between participants, it could be argued that the individuals in the high PP group were more susceptible to TMR than those in the low PP group for reasons other than the phonotactic probabilities of the stimuli (e.g., these individuals might be more attentive to sounds in the environment during sleep). While the authors acknowledge the small sample size and between-subjects comparison as a limitation, a discussion of an alternative interpretation of the data is missing.

      We agree with the reviewer that the small sample size and the between subject comparisons represent major limitations of our study. We thank the reviewer for this helpful comment and now discussed these limitations in more detail by adding alternative explanations and further suggestions for future research to overcome these limitations.

      We added the following sentences to the discussion on p.14, ll. 465-473: 

      To control for potential confounders despite the influence of difficulty in word learning on TMR, we compared parameters of sleep, the pre-sleep memory performance and the vigilance shortly before the post-sleep memory test, revealing no significant group differences (see Table S1 and S2). Nevertheless, we cannot rule out that other individual trait factors differed between the groups, such as the individual susceptibility to TMR. To rule out these alternative explanations based on individual factors, we suggest for future research to replicate our study by conducting a within-subject design with cueing of subsets of previously learned low- and high-PP words providing all conditions within the same individuals as shown in other TMR studies (Cairney et al., 2018; Schreiner and Rasch, 2015).

      Comment 15:

      (2) While the one-tailed comparison between the high PP condition and 0 is significant, the ANOVA comparing the four conditions (between subjects: cued/non-cued, within-subjects: high/low PP) does not show a significant effect. With a non-significant interaction, I would consider it statistically inappropriate to conduct post-hoc tests comparing the conditions against each other. Furthermore, it is unclear whether the p-values reported for the t-tests have been corrected for multiple comparisons. Thus, these findings should be interpreted with caution.

      We thank the reviewer for this comment giving us the opportunity to correct our analyses and clarify with additional description. Indeed, we investigated at first overnight changes in behavior performance within the four conditions, conducting t-tests against 0 of Δ-values of d' and c-criterion. Whereas for all our statistical analyses the p-value was set at p < 0.05 for two-tailed testing, we did not corrected the p-value of our behavior analyses for multiple comparisons. To investigate subsequently differences between conditions, we conducted additional ANOVAs. We agree with the reviewer that without significant of results of the ANOVA, post-hoc analyses should not be conducted. Taken in account as well the recommendation of reviewer 1, we included now only post-hoc pairwise comparisons when the interaction effect of the ANOVA revealed at least a trend of significance (p < 0.1). 

      We removed the following post-hoc analyses from the results section on p.9, ll. 291-295: 

      Additional post-hoc pairwise comparisons revealed a significant difference between the highPP cued and low-PP uncued (high-PP cued vs. low-PP uncued: t(10) = 2.43, p = 0.04), and no difference to other conditions (high-PP cued vs.: high-PP uncued t(20) = 1.28, p = 0.22; lowPP cued t(20) = 1.57, p = 0.13).  

      Further, we mentioned the lack of correction for multiple comparisons as a limitation of our results in the discussion on p.13, ll. 456-458:  

      The criteria of data analyses were not pre-registered and the p-values of our behavior analyses were not corrected for multiple comparisons.

      We added the following sentences to the methods p.23, ll. 842-849:

      To analyze overnight changes of sleep behavioral data within TMR conditions, we conducted at first dependent sample t-tests against 0 of Δ-values (post-sleep test minus pre-sleep test) of d' and c-criterion (see Fig. 3). Two-way mixed design ANOVAs were computed to compare Δvalues between TMR conditions. After confirming at least a trend of significance (p < 0.1) for the interaction effect, we conducted post-hoc pairwise comparisons by independent and dependent sample t-tests. For all behavior statistical analyses, the p-value was set at p < 0.05 for two-tailed testing. A p-value < 0.1 and > 0.05 was reported as a trend of significance.

      Comment 16:

      (3) With the assumption that the artificial words in the study have different levels of phonotactic similarity to prior word-sound knowledge, it was surprising to find that the phonotactic probabilities were calculated based on an American English lexicon whilst the participants were German speakers. While it may be the case that the between-language lexicons overlap, it would be reassuring to see some evidence of this, as the level of phonotactic probability is a key manipulation in the study.

      We thank the reviewer pointing to the misalignment between the German-speaking participants and the used artificial words based on American English. In line with this recommendation, we added a more outlined argumentation to the manuscript about the assumption of our study that major common phonetic characteristics across both languages are still preserved.       

      We now discussed these aspects on p.14, ll. 473-481:

      Further, we used artificial words based on American English in combination with German speaking participants, whereas language differences of pronunciation and phoneme structures might affect word perception and memory processing (Bohn and Best, 2012). On the other hand, both languages are considered to have the same language family (Eberhard et al., 2019) and the phonological distance between English and German is quite short compared for example to Korean (Luef and Resnik, 2023). Thus, major common phonological characteristics across both languages are still preserved. In addition, our behavior analyses revealed robust word discrimination learning and distinct memory performance according to different levels of phonotactic probabilities providing evidence of successful experimental manipulation. 

      Comment 17:

      (4) Another manipulation in the study is that participants learn whether the words are linked to a monetary reward or not, however, the rationale for this manipulation is unclear. For instance, it is unclear whether the authors expect the reward to interact with the TMR effects.

      We used a reward task combined with TMR because previous studies revealed beneficial effects of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021). In addition, we wanted to increase the motivation of the participants, as they could receive additional monetary compensation according to their learning and memory task performances. Furthermore, we designed the task, with the overall possibility to translate this task to operant conditioning in rats (see research proposal: https://data.snf.ch/grants/grant/168602). However, the task turned out to be too difficult to translate to rats, whereas we developed a different learning paradigm for the animal study (Klaassen et al., 2021) of this cross-species research project.       

      We added the following sentence to the introduction on p.4, ll. 134-137:

      To consider the beneficial effect of reward related information on sleep dependent memory consolidation and reactivation (Asfestani et al., 2020; Fischer and Born, 2009; Lansink et al., 2009; Sterpenich et al., 2021), we trained healthy young participants to categorize these words into rewarded and unrewarded words to gain and to avoid losses of money points.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comment 18:

      (1) Please clearly define all linguistics terms - and most importantly the term "phonotactics" - at first use.

      We thank the reviewer for this recommendation and we added the definition of phonotactics and further reduced the diversity of linguistic terms to improve readability. 

      We added the following sentences to the beginning of the introduction on p.3, ll. 72-76:

      One critical characteristic of similarity to pre-existing knowledge in auditory word processing is its speech sound (phoneme) pattern. In phonology as the field of language specific phoneme structures, phonotactics determines the constraints of word phoneme composition of a specific language.

      Comment 19:

      (2) Some critical details about the methods should be included in the Results section to make it comprehensible. For example, the way the crucial differences between G1-4 words should be addressed in the Results, not only in Figure 1.

      According to the recommendation, we added this information to the results section.  We added the following sentences to the results section on p.4, ll. 145-154:

      To study the impact of difficulty in word learning on TMR, we developed a novel learning paradigm. We formed four sets of artificial words (40 words per set; see Table S3 and S4) consisting of different sequences of two vowels and two consonants. Here, we subdivided the alphabet into two groups of consonants (C1: b, c, d, f, g, h, j, k, l, m; C2: n, p, q, r, s, t, v, w, x, z) and vowels (V1: a, e, I; V2: o, u, y). Four-letter-words were created by selecting letters from the vowel and consonant groups according to four different sequences (G1:C1, V1, V2, C2; G2: C1, V1, C2, V2; G3: V1, C1, C2, V2; G4: V1, C1, V2, C2; Fig. 1a; see methods for further details). Comparison analyses between the sets revealed significant differences in phonotactic probability (PP; Fig. 1b; unpaired t-tests: G1 / G2 > G3 / G4, p < 0.005, values of Cohen’s d > 0.71).

      Comment 20

      (3) Was scoring done both online and then verified offline? If so, please note that.

      We included now this information.  

      We adjusted the method section on p.21, ll. 765-769:   

      The sleep stages of NREM 1 to 3 (N1 to N3), wake, and REM sleep were scored offline and manually according to the criteria of the American Academy of Sleep Medicine (AASM) by visual inspection of the signals of the frontal, central, and occipital electrodes over 30s epochs (Iber et al., 2007). Based on offline scoring, we confirmed TMR exposure during N2 and N3 and no significant differences (p-values > 0.05) of sleep parameters between the cueing groups (see Table S2).  

      Comment 21:

      (4) In Figure 2, please arrange the panel letters in an easier-to-read way (e.g., label upper right panel b with a different letter).

      Now we rearranged the panel letters according to the recommendation.

      We adjusted Figure 2 on p.8, ll. 242-258:     

      Comment 22

      (5) In the first paragraph on TMR effects, please note which memory measure you are comparing (i.e., d').

      We added this information according to the recommendation.  

      We adjusted the sentence of the results on p.8, ll. 260-263:

      To examine whether TMR during sleep impacts memory consolidation of discrimination learning with respect to learning difficulty, we calculated the overnight changes by subtracting the pre- from the post-sleep memory performance based on d'-values of the reactivated sequences (cued) and non-reactivated sequences (uncued).

      Comment 23:

      (6) Please show the pre-sleep and post-sleep test scores for both word categories (not only the delta). It may be best to show this as another data point in Fig 2a, but it may be helpful to also see this split between cued and uncued.

      We added the pre-sleep and post-sleep test scores with the individual data points as an additional figure. 

      We added the following figure to the supplementary data on p.28, ll. 936-940:  

      Comment 24:

      (7) In the sentence "An additional two-way mixed design ANOVA on the same values with cueing as a between-subject factor (cued vs. uncued) ...", a more exact phrasing for the last parentheses would probably be "(high-PP-Cued vs Low-PP-Cued)". Both groups were cued.

      We thank the reviewer pointing this out. According to the recommendation, we corrected the descriptions of the two-way mixed design ANOVAs. In addition, we detected a mistake of wrong assignments of the conditions to ANOVAs and corrected the reported values.   

      We adjusted the sentences and corrected the values on p.9, ll. 271-275 and ll. 289-291: 

      An additional two-way mixed design ANOVA on the same values with the factor cueing (cued vs. uncued) as a within-subject factor and group as a between-subject factor revealed trends of significance (p < 0.1) for the interaction (cueing × group: F(1,20) = 3.47, p = 0.08) and the main effect of group (F(1,20) = 3.28, p = 0.09). The main effect of cueing was not significant (F(1,20) = 0.58, p = 0.46).

      An ANOVA on c-criterion changes showed no significant effects (interaction cueing × group: F(1,20) = 2.66, p = 0.12; main effect cueing  F(1,20) = 2.08, p = 0.17; main effect group F(1,20) = 0.38, p = 0.55).

      Comment 25:

      (8) In the same ANOVA, please mention that there is a trend toward an interaction effect. If there wasn't one, the post-hoc comparison would be unwarranted. Please consider noting other p<0.1 pvalues as a trend as well, for consistency.

      Regarding this recommendation, we included now only post-hoc pairwise comparisons after confirming at least a trend toward an interaction effect of these ANOVAs and reported consistently a p-value < 0.1 and > 0.05 as a trend of significance.

      We added the following sentences to the methods p.23, ll. 844-849:

      Two-way mixed design ANOVAs were computed to compare Δ-values between TMR conditions. After confirming at least a trend of significance (p < 0.1) for the interaction effect, we conducted post-hoc pairwise comparisons by independent and dependent sample t-tests. For all behavior statistical analyses, the p-value was set at p < 0.05 for two-tailed testing. A p-value < 0.1 and > 0.05 was reported as a trend of significance.

      We removed the following post-hoc analyses from the results section on p.9, ll. 291-295: 

      Additional post-hoc pairwise comparisons revealed a significant difference between the highPP cued and low-PP uncued (high-PP cued vs. low-PP uncued: t(10) = 2.43, p = 0.04), and no difference to other conditions (high-PP cued vs.: high-PP uncued t(20) = 1.28, p = 0.22; lowPP cued t(20) = 1.57, p = 0.13).          

      Comment 26:      

      (9) Please consider adding an analysis correlating spindle power with memory benefit across participants. Even if it is non-significant, it is important to report given that some studies have found such a relationship.

      According to this recommendation, we conducted an additional correlation analyses.

      We added the following sentences to the manuscript into the results (pp. 10-11, ll. 346-349), the discussion (p.12, ll. 413-417), and the methods (p.23, ll. 864-867):   

      Whereas we found a significant group difference in spindle power nested during SW up-states,   conducting further whole sample (n = 22) correlation analyses between the individual spindle power values of the significant cluster and the overnight changes of behavior measurements revealed no significant correlations (Δ d': r = 0.16, p = 0.48; Δ c-criterion: r = 0.19, p = 0.40).

      In addition to our result of the significant group difference, we failed to find significant correlations between SW nested spindle power values and overnight changes in behavior measurements, whereas previous studies reported associations of SW and spindle activities during sleep with the integration of new memories in pre-existing knowledge networks (Tamminen et al., 2013, 2010).

      By using the same extracted power values (0.3 to 0.8s; 11-14Hz; Pz, P3, P4, O2, P7) per subject, we performed whole sample (n = 22) Pearson correlation analyses between these power values and the overnight changes of behavior measurements of the cued condition (Δ d' and Δ ccriterion).

      Reviewer #2 (Recommendations For The Authors):

      (1) Choice of task

      Comment 27:      

      In general, I find your task well-designed and novel. In light of your research question, however, I wonder why you chose this task. When you outlined the research question in the introduction, I expected a task similar to Schreiner et al. (2015). For example, participants have to associate high PP words with each other and low PP words. The advantage here would be that you could test the benefits of TMR in a within-subjects design (for example, cueing half of the remembered high and half of the remembered low PP words).

      Please see our previous response at comment 14.    

      Comment 28:

      Why did you decide to introduce a reward manipulation?

      Please see our previous response at comment 11.    

      Comment 29:

      Why did you do the cueing on a category level (cueing all high PP or all low PP words instead of single word cueing or instead of cueing 20 reward high-PP, 20 unrewarded high-PP plus 20 reward low-PP and 20 unrewarded low-PP)? Both alternatives would have provided you the option to run your statistics within participants.

      Please see our previous response at comment 14.    

      Comment 30:

      (2) Between-subjects design and small sample size.

      Why did you decide on a between-subjects design that severely reduces your power?

      Why did you just collect 22 participants with such a design? Were there any reasons for this small sample size? Honestly, I think publishing a TMR study with healthy participants and such a small sample size (11 participants for some comparisons) is not advisable.

      Please see our previous response at comment 14.

      Comment 31:

      (3) Encoding performance.

      Is d' significantly above 0 in the first repetition round? I would assume that the distinction between rewarded and non-rewarded words is just possible after the first round of feedback.

      Indeed, conducting t-tests against 0 revealed significantly increased d'-values in the first repetition round (2nd presentation) in both PP conditions (high-PP: 0.85 ± 0.09, t(32) = 9.17, p < 0.001; low-PP: 0.62 ± 0.09, t(32) = 6.83, p < 0.001).  

      Comment 32:

      (4) Encoding response options

      If you want to you could make it more explicit what exactly the response options are. I assume that one button means a word has a high reward and the other button means a word has a low reward. Making it explicit increases the understanding of the results section.

      Please see our previous response at comment 3.

      Comment 33:           

      (5) Alpha desynchronisation.

      Relative change

      Why did you subtract alpha power during the 1st presentation from alpha power during 2nd and 3rd presentation? You baseline-corrected already and individually included the 1st, 2nd, and 3rd repetition in your behavioural analysis.

      Based on this analysis, we aimed to examine the relative change in alpha power between PP-conditions of memory-relevant word repetitions. Therefore, to extract memory relevant changes of EEG activities, the first word presentation of naive stimulus processing could serve as a more representative baseline condition covering the time-window of interest of 0.7 to 1.9 s after the stimulus onset compared to a baseline condition before stimulus onset (-1 to -0.1s). 

      To explain the rational of the analyses with the baseline condition more clearly, we added this information to the results section on p.7, ll. 222-226: 

      We obtained the changes in power values by subtracting the first from the second and third presentation for the high- and low-PP condition, respectively. Here, the first word presentation of naive stimulus processing served us with a more representative baseline condition covering the time-window of interest of 0.7 to 1.9 s after the stimulus onset to examine relevant changes of encoding.  

      Comment 34:

      (6) Alpha desynchronisation as a neural correlate of encoding depth & difficulty?

      "In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth. In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth."

      Given that the low-PP words are more difficult to learn, I was expecting to see higher alpha desynchronisation in the low-PP relative to the high-PP words. Could you outline in a bit more detail how your findings fit into the literature (e.g., Simon Hanslmayr did a lot of work on this)?

      I would also advise you to add citations e.g., after your sentence in the quote above ("as an assumed neural correlate of encoding depth").

      We thank the reviewer for the recommendation giving us the opportunity to discuss in more detail how our results relate to previous findings. 

      We added additional sentences to the discussion on p.13, ll. 441-455:    

      Additional studies linked alpha desynchronization to cognitive effort and cognitive load (Proskovec et al., 2019; Zhu et al., 2021). So, one could assume to observe higher alpha desynchronization in the more difficult to learn condition of low-PP compared to high-PP. On the other hand numerous studies investigating oscillatory correlates of learning and memory showed that alpha desynchronization is associated with memory across different tasks, modalities and experimental phases of encoding and retrieval (Griffiths et al., 2016, 2021, 2019a, 2019b; Hanslmayr et al., 2009; Michelmann et al., 2016). Strikingly, Griffith and colleagues (Griffiths et al., 2019a) revealed by simultaneous EEG-fMRI recordings a negative correlation between the occurrence of patterns of stimulus-specific information detected by fMRI and cortical alpha/beta suppression. Here, the authors suggested that a decrease of alpha/beta oscillations might represent the neuronal mechanism of unmasking the task-critical signal by simultaneous suppression of task-irrelevant neuronal activities to promote information processing. Following this interpretation, we assume that over the course of learning elevated memory processing of the easier to learn stimuli is associated with enhanced information processing and thus accompanied by higher cortical alpha desynchronization in comparison of the more difficult to learn stimuli.

      In addition, we added the mentioned quote on p.7, ll. 239-240:

      In addition to the behavior results, these EEG results indicate differences between PP conditions in desynchronization of alpha oscillations, as an assumed neural correlate of encoding depth (Griffiths et al., 2021; Hanslmayr et al., 2009).

      Comment 35:

      (7) Exclusion criterion.

      Why did you use a d' > 0.9 as a criterion for data inclusion?

      This criterion ensured that each included subject had at least in one PP-condition a d' > 1.05 of pre-sleep memory performance, which corresponds to a general accuracy rate of 70%. 

      Accordingly, we adjusted these sentences of the method section on p.19, ll. 677-680: 

      Data were excluded from subjects who did not reach the minimal learning performance of d' > 1.05 during the pre-sleep memory test in at least one of the two PP conditions, whereas this threshold value corresponds to accuracy rates of 70% (n = 5). In addition, we excluded one subject who showed a negative d' in one PP condition of the pre-sleep memory test (n = 1). 

      Comment 36:

      (8) Coherence of wording.

      When you talk about your dependent variable (d') you sometimes use sensitivity. I would stick to one term.

      We replaced the word sensitivity with d'.    

      (9) Criterion

      Comment 37:

      Why do you refer to a change in criterion (Figure 3b, axis labels) as a change in memory? Do you think the criterion says something about memory?

      We corrected the axis label of Figure 3b and deleted here the word memory.

      Comment 38:

      Additionally, why did you analyse the effect of TMR on the criterion? Do you expect the criterion to change due to sleep-dependent memory consolidation? This section would benefit from more explanation. Personally, I am very interested in your thoughts and your hypothesis (if you had one, if not that is also fine but then, make it explicit that it was an exploratory analysis).

      By conducting exploratory analyses of overnight changes of the c-criterion measurements, we aimed to examine the bias of decision-making to provide comprehensive data according to the framework of the signal detection theory. Regarding the previous literature showing mainly beneficial effects of sleep on learning and memory, we focused with our hypothesis on d' and explored additionally the c-criterion.

      Despite our task design with gains/hits of +10 money points and losses/FAs of -8 (instead of -10), the subjects showed already during the pre-sleep memory task significant biases towards loss avoidance in both PP conditions (t-tests against 0: high-PP: 0.44 ± 0.07, t(21) = 5.63, p < 0.001; low-PP: 0.47 ± 0.09, t(21) = 5.51, p < 0.001). As already reported in the preprint, we found an additional significant increase of c-criterion by TMR solely for the high-PP words (see Fig. 3b). Even by integrating subjects with poor pre-sleep memory performance (high-PP-cueing group: n = 15; low-PP-cueing group: n = 13), t-tests against 0 revealed a significant increase of the high-PP cueing condition (t(14) = 3.36, p = 0.005) and no significant overnight changes in the other conditions (high-PP uncued: t(12) = 1.39, p = 0.19; low-PP cued: t(12) = 1.47, p = 0.17; low-PP uncued: t(14) = -0.20, p = 0.84). These exploratory findings on c-criterion suggest potential applications of TMR to affect decision-making biases in combination with reward learning.      

      We revised the manuscript mentioning the exploratory character of the c-criterion analyses of the results on p.9, ll. 282-283 and of the discussion on p.12, ll. 400-402:  

      We examined next as an exploratory analysis whether TMR conditions influence biases in decision-making.

      By conducting an additional exploratory analysis, we observed a significant change of the decision bias in the cueing condition of the easy to learn words and no overnight changes in the other conditions.

      Comment 39:

      (10) You detected SWs in the time range of 0-6 sec post sound stimulation. How was the distribution of all detected SW down-states in this time range? (You could plot a histogram for this.)

      We illustrated now the detected SWs in the time range of 0 to 6 s after stimulus onset. 

      We added a histogram to the supplementary section on p.30, ll. 982-986:  

      Reviewer #3 (Recommendations For The Authors):

      Comment 40:

      (1) In line with the weakness outlined above, I would recommend including a discussion of how the between-subject comparison and small sample size could affect the results and provide alternative interpretations.

      Please see our previous response at comment 14.

      Comment 41:

      (2) Regarding my point about statistical comparisons, I would recommend that the authors follow best practice guidelines for post-hoc tests and multiple comparisons. In Figures 3a and b, I would also recommend removing the stars indicating significance from the post-hoc tests (if this is what they reflect). Perhaps this link will be useful: https://www.statology.org/anova-post-hoc-tests/

      Please see our previous response at comment 15.    

      Comment 42:

      (3) Furthermore, to address any doubts about the possible phonotactic probability differences between languages, I would recommend that the authors show whether the languages overlap, the level of English fluency in the German-speaking participants, and/or another way of reassuring that this is unlikely to have affected the results.

      Please see our previous response at comment 7.    

      Comment 43:

      (4) In the introduction, I would recommend that the authors outline a clear rationale for the reward/no reward manipulation.

      Please see our previous response at comment 11.    

      Comment 44:

      (5) Figure 1c: Please include what response options participants had, e.g., 'rewarded/not rewarded'. This would make the type of categorization clearer to the reader.

      Please see our previous response at comment 3.

      Comment 45:

      (6) It is unclear whether the additional ANOVA conducted on the time and frequency of the identified clusters included all channels or only the channels contributing to the cluster. Consider clarifying this in the relevant methods and results. Furthermore, I would recommend labelling this as a posthoc test as this analysis was guided by an initial peak at the data and the timings, frequencies, and channels of interest were not selected a-priori.

      We thank the reviewer for this recommendation and labelled the additional repeatedmeasure ANOVA as a post-hoc test. Further, we mentioned the used channels (Pz and Cz) for this analyses.

      We adjusted the results section on p.7, ll. 230-233 and the methods section on p.23, ll. 858-860:            

      A post-hoc repeated-measure ANOVA on alpha power changes (merged over Pz and Cz electrodes) with PP (high vs. low) and presentations (2 to 3) as within-subjects factors revealed a main effect of PP (F(1,32) = 5.42, p = 0.03, η2 = 0.15), and a significant interaction (F(1,32)  = 7.38, p = 0.01, η2 = 0.19; Fig. 2e).

      After confirming the existence of a significant cluster, we conducted an additional post-hoc repeated-measure ANOVA with averaged values of the identified time and frequency range of interest and merged over the Pz and Cz electrodes (see Fig. 2e).

      Comment 46:

      (7) Figure 3: To better illustrate within- vs. between-subjects comparisons and promote transparency, please add individual points and lines between the within-subjects conditions.

      According to this recommendation, we changed Figure 3 to add the individual data points by lines.  

      We modified Figure 3 on p.9, ll. 299-303:  

      Comment 47:

      (8) For the SW density time-bin analyses, please include statistics for all comparisons (i.e., through 0 s to 3 s) and say whether these were corrected for multiple comparisons.

      According to this recommendation, we included now statistics for all comparisons. 

      We added table S6 table to the supplementary data on p.29, l.962:     

      Comment 48:

      (9) Consider reporting effect sizes.

      We thank the reviewer for this recommendation and we added now effect sizes of significant results. 

      Comment 49:

      (10) For transparency and replicability, consider including a list of the four stimulus sets including their phoneme and biphone probabilities.

      We included a list of the four stimulus sets with their phoneme and biphone probabilities  

      We added table S3 and table S4 to the supplementary data on pp. 26-27:       

      References

      Asfestani MA, Brechtmann V, Santiago J, Peter A, Born J, Feld GB. 2020. Consolidation of Reward Memory during Sleep Does Not Require Dopaminergic Activation. J Cogn Neurosci 32:1688– 1703. doi:10.1162/JOCN_A_01585

      Batterink LJ, Oudiette D, Reber PJ, Paller KA. 2014. Sleep facilitates learning a new linguistic rule.

      Neuropsychologia 65:169–79. doi:10.1016/j.neuropsychologia.2014.10.024

      Batterink LJ, Paller KA. 2017. Sleep-based memory processing facilitates grammatical generalization: Evidence from targeted memory reactivation. Brain Lang 167:83–93. doi:10.1016/J.BANDL.2015.09.003

      Bohn OS, Best CT. 2012. Native-language phonetic and phonological influences on perception of American English approximants by Danish and German listeners. J Phon 40:109–128. doi:10.1016/J.WOCN.2011.08.002

      Cairney SA, Guttesen A á. V, El Marj N, Staresina BP. 2018. Memory Consolidation Is Linked to Spindle-Mediated Information Processing during Sleep. Curr Biol 28:948-954.e4. doi:10.1016/j.cub.2018.01.087

      Eberhard DM, Simons GF, Fennig CD. 2019. Ethnologue: Languages of the world . SIL International. Online version: http://www.ethnologue.com.

      Fischer S, Born J. 2009. Anticipated reward enhances offline learning during sleep. J Exp Psychol Learn Mem Cogn 35:1586–1593. doi:10.1037/A0017256

      Green DM, Swets JA. 1966. Signal detection theory and psychophysics., Signal detection theory and psychophysics. Oxford,  England: John Wiley.

      Griffiths B, Mazaheri A, Debener S, Hanslmayr S. 2016. Brain oscillations track the formation of episodic memories in the real world. Neuroimage 143:256–266. doi:10.1016/j.neuroimage.2016.09.021

      Griffiths BJ, Martín-Buro MC, Staresina BP, Hanslmayr S, Staudigl T. 2021. Alpha/beta power decreases during episodic memory formation predict the magnitude of alpha/beta power decreases during subsequent retrieval. Neuropsychologia 153. doi:10.1016/j.neuropsychologia.2021.107755

      Griffiths BJ, Mayhew SD, Mullinger KJ, Jorge J, Charest I, Wimber M, Hanslmayr S. 2019a. Alpha/beta power decreases track the fidelity of stimulus specific information. Elife 8. doi:10.7554/eLife.49562

      Griffiths BJ, Parish G, Roux F, Michelmann S, van der Plas M, Kolibius LD, Chelvarajah R, Rollings DT, Sawlani V, Hamer H, Gollwitzer S, Kreiselmeyer G, Staresina B, Wimber M, Hanslmayr S. 2019b. Directional coupling of slow and fast hippocampal gamma with neocortical alpha/beta oscillations in human episodic memory. Proc Natl Acad Sci U S A 116:21834–21842. doi:10.1073/pnas.1914180116

      Hanslmayr S, Spitzer B, Bäuml K-H. 2009. Brain oscillations dissociate between semantic and nonsemantic encoding of episodic memories. Cereb Cortex 19:1631–40. doi:10.1093/cercor/bhn197

      Iber C, Ancoli‐Israel S, Chesson AL, Quan SF. 2007. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. Westchester, IL: American Academy of Sleep Medicine.

      Klaassen AL, Heiniger A, Sánchez PV, Harvey MA, Rainer G. 2021. Ventral pallidum regulates the default mode network, controlling transitions between internally and externally guided behavior. Proc Natl Acad Sci U S A 118:1–10. doi:10.1073/pnas.2103642118

      Lansink CS, Goltstein PM, Lankelma J V., McNaughton BL, Pennartz CMA. 2009. Hippocampus leads ventral striatum in replay of place-reward information. PLoS Biol 7. doi:10.1371/JOURNAL.PBIO.1000173

      Luef EM, Resnik P. 2023. Phonotactic Probabilities and Sub-syllabic Segmentation in Language

      Learning. Theory Pract Second Lang Acquis 9:1–31. doi:10.31261/TAPSLA.12468

      Michelmann S, Bowman H, Hanslmayr S. 2016. The Temporal Signature of Memories: Identification of a General Mechanism for Dynamic Memory Replay in Humans. PLoS Biol 14:e1002528. doi:10.1371/journal.pbio.1002528

      Proskovec AL, Heinrichs-Graham E, Wilson TW. 2019. Load Modulates the Alpha and Beta Oscillatory Dynamics Serving Verbal Working Memory. Neuroimage 184:256. doi:10.1016/J.NEUROIMAGE.2018.09.022

      Reber AS. 1967. Implicit learning of artificial grammars. J Verbal Learning Verbal Behav 6:855–863.

      doi:10.1016/S0022-5371(67)80149-X

      Schreiner T, Rasch B. 2015. Boosting vocabulary learning by verbal cueing during sleep. Cereb Cortex 25:4169–4179. doi:10.1093/cercor/bhu139

      Sterpenich V, van Schie MKM, Catsiyannis M, Ramyead A, Perrig S, Yang H-D, Van De Ville D, Schwartz S. 2021. Reward biases spontaneous neural reactivation during sleep. Nat Commun 2021 121 12:1–11. doi:10.1038/s41467-021-24357-5

      Tamminen J, Lambon Ralph MA, Lewis PA. 2013. The role of sleep spindles and slow-wave activity in integrating new information in semantic memory. J Neurosci 33:15376–15381. doi:10.1523/JNEUROSCI.5093-12.2013

      Tamminen J, Payne JD, Stickgold R, Wamsley EJ, Gaskell MG. 2010. Sleep spindle activity is associated with the integration of new memories and existing knowledge. J Neurosci 30:14356–60. doi:10.1523/JNEUROSCI.3028-10.2010

      Zhu Y, Wang Q, Zhang L. 2021. Study of EEG characteristics while solving scientific problems with different mental effort. Sci Rep 11. doi:10.1038/S41598-021-03321-9

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This important study explores the potential influence of physiologically relevant mechanical forces on the extrusion of vesicles from C. elegans neurons. The authors provide compelling evidence to support the idea that uterine distension can induce vesicular extrusion from adjacent neurons. The work would be strengthened by using an additional construct (preferably single-copy) to demonstrate that the observed phenotypes are not unique to a single transgenic reporter. Overall, this work will be of interest to neuroscientists and investigators in the extracellular vesicle and proteostasis fields. 

      We now include supporting data using a single copy alternate fluorescent reporter expressed in touch neurons (Fig. 3H).

      In brief, we examined the induction of exophergenesis in an alternative single-copy transgene strain that expresses mKate fluorescent protein specifically in touch receptor neurons. As compared to the multi-copy transgene that is broadly used in this study and expresses mCherry fluorescent protein specifically in touch receptor neurons, the mKate single-copy transgene is associated with a much lower frequency of exophergenesis. However, increasing uterine distension via blocking egg-laying can increase the exophergenesis of the mKate single-copy transgenic line from 0% to approximately 60% on adult day 1, indicating that the observed response is not tied to a single reporter.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors sought to understand the stage-dependent regulation of exophergenesis, a process thought to contribute to promoting neuronal proteostasis in C. elegans. Focusing on the ALMR neuron, they show that the frequency of exopher production correlates with the timing of reproduction. Using many genetic tools, they dissect the requirements of this pathway to eventually find that occupancy of the uterus acts as a signal to induce exophergenesis. Interestingly, the physical proximity of neurons to the egg zone correlates with exophergenesis frequency. The authors conclude that communication between the uterus and proximal neurons occurs through the sensing of mechanic forces of expansion normally provided by egg occupancy to coordinate exophergenesis with reproduction. 

      Strengths: 

      The genetic data presented is thorough and solid, and the observation is novel. 

      Weaknesses: 

      The main weakness of the study is that the detection of exophers is based on the overexpression of a fluorescent protein in touch neurons, and it is not clear whether this process is actually stimulated in wild-type animals, or if neurons have accumulated damaged proteins in relatively young day 2 animals. 

      We now include data using a single copy alternate fluorescent reporter expressed in touch neurons. Although baseline exopher levels are low in this strain, we demonstrate that inducing egg retention in this background markedly increases exopher generation from a baseline of near zero to ~60% (new Fig. 3H), supporting that uterine distention, rather than reporter identity, is associated with early life exopher elevation. Data also add to our observations indicating that high protein-expressing strains generally produce higher baseline levels of exophers in early adulthood (for example, Melentijevic et al. (PMID 28178240) documented that mCherry RNAi knockdown in the strain primarily studied here can lower exopher levels).

      The second point raised here, regarding the occurrence and physiological role of early-adult exophers in “native” non-stressed neurons is a fascinating question that we are beginning to address in continuing experiments. Readers will appreciate that quantifying relatively rare, “invisible” touch receptor neuron exophergenesis accurately without expressing a fluorescent reporter is technically challenging. Our speculation, outlined now a bit more clearly in the Discussion here, is that certain molecular and organelle debris that cannot readily be degraded in cells during larval development may be stored until release to more capable degradative neighbors or to the coelomocytes for later management, as one component of the early adult transition in proteostasis (see J. Labbadia and R. I. Morimoto, PMID 24592319). Receiving cells may be primed for this at a particular timepoint, possibly analogous to the “bulky garbage” collection of over-sized difficult-to-dispose-of household items that a town will address with specialized action only at specific times. The prediction is that we should be able to detect some mass protein aggregation through early development, and at least partial elimination by adult day 3; this elimination should be impaired when eggs are eliminated. Initial testing is underway.

      Reviewer #2 (Public Review): 

      Summary: 

      This paper reports that mechanical stress from egg accumulation is a biological stimulus that drives the formation of extruded vesicles from the neurons of C. elegans ALMR touch neurons. Using powerful genetic experiments only readily available in the C. elegans system, the authors manipulate oocyte production, fertilization, embryo accumulation, and egg-laying behavior, providing convincing evidence that exopher production is driven by stretch-dependent feedback of fertilized, intact eggs in the adult uterus. Shifting the timing of egg production and egg laying alters the onset of observed exophers. Pharmacological manipulation of egg laying has the predicted effects, with animals retaining fewer eggs having fewer exophers and animals with increased egg accumulation having more. The authors show that egg production and accumulation have dramatic consequences for the viscera, and moving the ALMR process away from eggs prevents the formation of exophers. This effect is not unique to ALMR but is also observed in other touch neurons, with a clear bias toward neurons whose cell bodies are adjacent to the filled uterus. Embryos lacking an intact eggshell with reduced rigidity have impaired exopher production. Acute injection into the uterus to mimic the stretch that accompanies egg production causes a similar induction of exopher release. Together these results are consistent with a model where stretch caused by fertilized embryo accumulation, and not chemical signals from the eggs themselves or egg release, underlies ALMR exopher production seen in adult animals. 

      Strengths: 

      Overall, the experiments are very convincing, using a battery of RNAi and mutant approaches to distinguish direct from indirect effects. Indeed, these experiments provide a model generally for how one would methodically test different models for exopher production. The paper is well-written and easy to understand. I had been skeptical of the origin and purpose of exophers, concerned they were an artefact of imaging conditions, caused by deranged calcium activity under stressful conditions, or as evidence for impaired animal health overall. As this study addresses how and when they form in the animal using otherwise physiologically meaningful manipulations, the stage is now set to address at a cellular level how exophers like these are made and what their functions are. 

      Weaknesses: 

      Not many. The experiments are about as good as could be done. Some of the n's on the more difficult-to-work strains or experiments are comparatively low, but this is not a significant concern because of the number of different, complementary approaches used. The microinjection experiment in Figure 7 is very interesting, there are missing details that would confirm whether this is a sound experiment. 

      We expanded description of details for the microinjection experiment in both the figure legend and the methods section, to enhance clarity and substantiate approach.

      Reviewer #3 (Public Review): 

      Summary: 

      In this paper, the authors use the C. elegans system to explore how already-stressed neurons respond to additional mechanical stress. Exophers are large extracellular vesicles secreted by cells, which can contain protein aggregates and organelles. These can be a way of getting rid of cellular debris, but as they are endocytosed by other cells can also pass protein, lipid, and RNA to recipient cells. The authors find that when the uterus fills with eggs or otherwise expands, a nearby neuron (ALMR) is far more likely to secrete exophers. This paper highlights the importance of the mechanical environment in the behavior of neurons and may be relevant to the response of neurons exposed to traumatic injury. 

      Strengths: 

      The paper has a logical flow and a compelling narrative supported by crisp and clear figures. 

      The evidence that egg accumulation leads to exopher production is strong. The authors use a variety of genetic and pharmacological methods to show that increasing pressure leads to more exopher production, and reducing pressure leads to lower exopher production. For example, egg-laying defective animals, which retain eggs in the uterus, produce many more exophers, and hyperactive egg-laying is accompanied by low exopher production. The authors even inject fluid into the uterus and observe the production of exophers. 

      Weaknesses: 

      The main weakness of the paper is that it does not explore the molecular mechanism by which the mechanical signals are received or responded to by the neuron, but this could easily be the subject of a follow-up study. 

      We agree that the molecular mechanisms operative are of considerable interest, and our initial pursuit suggests that a comprehensive study will be required for satisfactory elaboration of how mechanical signals are received or responded to by the neuron.

      I was intrigued by this paper, and have many questions. I list a few below, which could be addressed in this paper or which could be the subject of follow-up studies. 

      - Why do such a low percentage of ALMR neurons produce exophers (5-20%)? Does it have to do with the variability of the proteostress? 

      We do not yet understand why some ALMR neurons within a same genotype will produce exophers and some will not. We know that in addition to the uterine occupation we report here, proteostasis compromise, feeding status, oxidative stress, and osmotic stress can elevate exopher numbers (PMID 34475208); cell autonomous influences on exopher levels include aggresome-associated biology (PMID 37488107) and expression levels of the mCherry protein (PMID 28178240). Turek reports that social interaction on plates can influence muscle exopher levels (PMID 34288362). Thus, although variable proteostress experienced by neurons is likely a factor, we have not yet experimentally defined specific trigger rules. We suspect the summation of internal proteostasis crisis and environmental conditions, including particular force vectors/frequency will underlie the variable exopher production phenomeonon.

      - Why does the production of exophers lag the peak in progeny production by 24-48 hours? Especially when the injection method produces exophers right away?

      The progeny production can track well with exopher production (Fig. 1B), although the nature of egg counts (permanent, one time events) vs. exophers (which are slowly degraded) can skew the peak scores apart. We synchronized animals at the L4 stage. 24 hours later was adult day 1, and we measured then and every subsequent 24 hours. The daily progeny count reflects the total number of progeny produced every 24 hours; exopher events were scored once a day, but exophers can persist such that the daily exopher count can partially reflect slow degradation, with some exophers being counted on two days. We now explain our scoring details better in the Methods section.

      The rapid appearance of exophers, as early as about ~10 minutes after sustained injection, is fascinating and probably holds mechanistic implications for exopher biology. For one thing, we can infer that in the mCherry Ag2 background, touch neurons can be poised to extrude exophers, but that the pressure/push acts to trigger or license final expulsion. It is interesting that we found we needed to administer sustained injection of two minutes to find exopher increase (now better emphasized in the expanded Methods section). We speculate that a multiple pressure events, or sustained force vector might be critical (like an egg slowly passing through??). Minimally, this assay may help us assign molecular roles to pathway components as we identify them moving forward. 

      - As mentioned in the discussion, it would be interesting to know if PEZO-1/PIEZO is required for uterine stretching to activate exophergenesis. pezo-1 animals accumulate crushed oocytes in the uterus. 

      We have begun to test the hypothesis that PEZO-1 is a signaling component for ALMR exophergenesis, initially using the N and C terminal pezo-1 deletion mutants as in Bai et al. (PMID 32490809). These pezo-1 mutants have a mild decrease in ALMR exophergenesis under normal conditions. However, vulva-less conditions in pezo-1N and piezo-1C increased ALMR exophergenesis from approximately 10% to 60%, similar to the response of wild-type worms to high mechanical stress, data that suggest PEZO-1 is not a required player in mediating mechanical force-induced ALMR exophergenesis. We are currently testing genetic requirements for other known mechanosensors. We intend comprehensive investigation of the molecular mechanisms of mechanical signaing in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      -The study would be significantly strengthened by the addition of data detecting regulation of exophergenesis by uterine forces in a more physiological context, in the absence of overexpression of a toxic protein. In other words, is this a process that occurs naturally during reproduction, or is it specific to proteotoxic stress induced by overexpression? Perhaps the authors could repeat key experiments using a single copy transgene, and challenge the animals with exogenous proteotoxic stress if necessary.

      We now include data using a single copy alternate fluorescent reporter expressed in touch neurons. Although baseline exopher levels are low in this strain, we demonstrate that inducing egg retention in this background markedly increases exopher generation from a baseline of near zero to ~60% (Fig. 3H), supporting that uterine distention, rather than reporter identity or over-expression alone dries early life exopher elevation.

      Also noteworthy is that we find exophergenesis in the single-copy transgenic line is only approximately 0.3% on adult day 2 (average in three trials, data not shown), which is much lower than the 5-20% exophergenesis rate typically observed in the multi-copy high expression mCherry transgenic line. Therefore, consequences of overexpression of mCherry likely potentiate exophergenesis.

      -The authors mention that exophergenesis has been described in muscle cells. Is this also dependent on the proximity to the uterus? It would have been interesting to include data on other cell types in the vicinity of the reproductive system.

      Yes, in interesting work on exophers produced by muscle, Turek et al. reported that muscle exopher events are mostly located in a region proximal to the uterus. Moreover, this work also documented that sterile hermaphrodites are associated with approximately 0% muscle exophergenesis, and egg retention in the uterus strongly increases muscle exophergenesis (PMID: 34288362).  

      -Is exophergenesis also induced by other forms of mechanical stress? For example, swimming.

      We have looked at crude treatments such as centrifugation or vortexing without observing changes in exopher levels. Our preliminary work indicates that swimming can increase exophergenesis, and this effect depends on the presence of eggs in the uterus. We appreciate the question, and expect to include documentation of alternative pressure screening in our planned future paper on molecular mechanisms.

      -In Figure 1E, the profile of exopher production for the control condition at 25oC is very similar to the profile observed at 20oC in Figure 1B. However, the profile of progeny production at 25oC is known to have an earlier peak of progeny production. Perhaps egg retention is differently correlated with progeny production at this temperature? The authors could easily test this.

      Overall, exophers (which degrade with time) and progeny counts (a fixed number) have slightly different temporal features, anchored in part by how long exophers or their “starry night” debris persist. Most exophers start to degrade within 1-6 hours (PMID: 36861960), but exopher debris can persist for more than 24 hours. An exopher event observed on day 1 may thus also be recorded at the day 2 time point, which leads to a higher frequency of exopher events on day 2 as compared to day 1.

      We have previously published on the impact of temperature on exopher number (Supplemental Figure 2 in PMID 34475208). In brief, increasing culture temperature for animals that are raised over constant lifetime temperature modestly increases exopher number; a greater increase in exophers is observed under conditions in which animals were switched to a higher temperature in adult life, suggesting changes in temperature (a mandatory part of the ts mutant studies) engages complex biology that modulates exopher production. Our previous data show that in a temperature shift to 25oC, the peak of exophers was at adult day 1. Here, Fig. 1B is constant temperature, 20oC; Fig. 1E has a temperature shift 15-25oC. That egg retention might be temperature-influenced is a plausible hypothesis, but given the complexities of temperature shifts for some mutants, we elected to defer drill-down on the temperature-exopher-egg relationship. 

      -It is not clear how to compare panels A and B in Figure 3. In panel A the males are present throughout the adult life of the hermaphrodites whereas in panel B the males are added in later life. Therefore, the effect of later-life mating on progeny production is not shown and the title of panel A in the legend is misleading. The authors need to perform a progeny count in the same conditions of mating presented in Figure 3B to allow direct comparison.

      As Reviewer 1 suggested, we performed a new progeny count now presented in new Fig. 3A, which more appropriately matches the study presented in Fig. 3B; legends adjusted.

      -On page 12, the authors state that the baseline of exophergenesis in rollers is 71%, but then attribute the 71% in Figure 4F to exophergenesis specifically in ALMR that is posterior to AVM. The authors need to clarify this point.

      Good catch on our error. The baseline of exophergenesis in rollers is ~40%, and we corrected the main text.

      -Considering the conclusion of Figure 2 that blocking embryonic events passed the 4-cell stage does not impact exopher production, it would have been interesting to compare the uterine length for emb-8 and for mex-3, since it is quite intriguing that the former suppresses exopher production while the latter has no effect.

      We repeated the emb-8 and mex-3 RNAi for these studies and encountered variability in outcome for 2 cell stage disruption via emb-8 RNAi, which is consistent with the range of published endpoints for emb-8 RNAi. We elected to include these emb-8 findings in the figure legend 2G, but removed the RNAi data from the main text figure. mex-3 uterine measures are added to revised panels 5H, 6I.

      Reviewer #2 (Recommendations For The Authors): 

      -Leaving the worms in halocarbon oil for too long (e.g. 10 min) can desiccate and kill them. Did the authors take them out of the oil before analyzing exopher production? The authors refer to these as 'sustained injections' without much description beyond that. As the worms are very small, the flow rate needed for a sustained injection over 2 minutes must be very low - so low that the needle is in danger of being clogged. Do the authors have an estimate of how much fluid was injected or the overall flow rate? I realize the flow rate measured outside of the worm may not compare directly to that of a pressurized worm, but such estimates would be instructive, particularly if they can be related to the relative volume of the eggs the injection is trying to mimic.

      After injection or mock injection, we removed the animal from the oil and flipped it if necessary to observe the ALMR neuron on the NGM-agar plate. We now expanded description of the experimental details of injection, including the estimated flow rate, in the revised Methods section.

      - The authors describe the ALMR neurons as "proteostressed", but I am not clear on whether these neurons were treated in a unique procedure to induce such a state or if the authors are merely building on other observations that egg-laying adults are dedicating significant resources to egg production, so they must be proteostressed. If they are not inducing a proteostressed state in their experiments, the authors should refrain from describing their neurons and effects as depending on such a state.

      We revised to more explicity feature published evidence that the ALMR neurons we track with mCherryAg2 bz166 are likely protestressed. Overexpression of mCherry in bz166 is associated with enlargement of lysosomes and formation of large mCherry foci that often correspond toe LAMP::GFP-positive structures in ALMR neurons (PMID: 28178240; PMID: 37488107). Marked changes in ultrastructure reflect TN stress in this background. These cellular features are not seen in wild type animals. We previously published that mCherry, polyQ74, polyQ128, Ab1-42 (which enhance proteostress) over-expression all increase exophers (PMID: 28178240). Likewise most genetic compromise of different proteostasis branches--heat shock chaperones, proteasome and autophagy--promote exophergenesis, supporting exophergenesis as a response to proteostress. In sum, the mCherryAg2 bz166 appear markedly stressed above a non-over expressing line and produce more exophers. RNAi knockdown of the mCherry lowers exopher levels (PMID: 28178240).

      In response to reviewer comment, we added a study with a single copy mKate reporter (new data Fig. 3H). We find a very low baseline of exophers in this background. This would support that high autonomous compromise associated with over-expression influences exopher levels. Interestingly, however, we found that ALMR neurons expressing mKate under a single-copy transgene still exhibit excessive exopher production (>60%) under high mechanical stress (Fig. 3H). These data are consistent with ideas that mechanical stresses can enhance exopher production, and may markedly lower the threshold for exophergenesis in close-to-native stress level neurons.

      - The authors should include more details on the source and use of the RNAi, for example, if the clones were from the Ahringer RNAi library, made anew for this study, or both.

      We now add this information in the methods section.

      - I would be curious if the authors would similarly see an induction in exopher production after acute vulval muscle silencing with histamine. I'm not suggesting this experiment, but it may offer a way to induce exophers in a more controlled manner.

      This is a great suggestion that we will try in future studies.

      - I am not sure if Figure 5 needs to be a main figure in the paper or if it would be more appropriate as a supplement.

      We considered this suggestion but we think that the strikingly strong correleation of uterus length and exopher levels is a major point of the story and these data establish a metric that we will use moving forward to distinquish whethere an exopher modulation disruption is more likely to act by modulation of reproduction or modulation of touch neuron biology. For this reason we elected to keep Figure 5 in the main text.

      Reviewer #3 (Recommendations For The Authors): 

      -The Statistics section in the methods should be expanded to describe the statistics used in the experiments that aren't nominal, of which there are many.

      We have updated and expanded the statistics section.

      -P.2 Line 49 spelling 'que' should be queue (I remember this by the useless queue of letters lined up after the 'q').

      Corrected 

      -The introduction has a bit too much information about oocyte maturation, not relevant to the study.

      We agree that the information about oocyte maturation is not critical for the laying out the related experiments and cut this section to improve focus.

      -p.3 line 22: Some exophers are seen on Day 3, so this should be restated for accuracy.

      Corrected

      -p.3 line 26. Explain here why sperm is necessary (ooyctes don't mature or ovulate effectively without sperm).

      We added this clarifying explanation.

      -p.3 line 44 Clarify in the spe-44 the oocytes are in the oviduct (not the uterus). Might be helpful to include a DIC image to accompany the helpful diagram in Figure 1D. 

      We added a sentence describing the impact of sperm absence on oocyte maturation, progression into the uterus, and retention in the gonad, with reference to PMID: 17472754.  We were able to add a DIC in the tightly packed Figure 1.

      In Supplemental Figure 6, we now include a field picture of oocyte retention in the sem-2 mutant and upon treatment of lin-39(RNAi).

      -p.5 line 3 in the Figure 1D legend; recommend delete 'light with' which is confusing and just refer to the sperm as dark dots. 

      Corrected

      -p.6 line 22-24 Check for alignment of the statements with Figure 2 (2F is cited, but it should be 2G).

      Corrected

      -p12 line 13-15; Many ALMRs not in the egg zone (70%) did not produce exophers - this is still quite a lot. It would be good to state this section in a more straightforward way (less leading the reader) and if possible to give a possible explanation.

      We modified the text to be less leading: “Thus, although ALMR soma positioning in the egg zone does not guarantee exophergenesis in the mCherryAg2 strain, the neurons that did make exophers were nearly always in the egg zone.”

      -p.15 paragraph 3 - clarify how uterine length was controlled for the overall body length of the worm.

      We did not systematically measure body length, but rather focused on uterine distention. It would be of interest to determine if length of the body correlates with uterine size, and then address how that relationship translates to exopher production but here our attention came to rest on the striking correlation of uterine length and number of exophers.

      -p.17 line 23-25; Could be stated more simply. 

      We adjusted the text: “Moreover, the oocyte retention was similarly efficacious in elevating exopher production to egg retention, increasing ALMR exophergenesis to approximately 80% in the sem-2(rf) mutant (Fig. 6C)”.

      -p.23 Line 4. I think by the time the reader reaches this sentence, the egg-coincident exophorgenesis will not be 'puzzling'. 

      Agreed, corrected.

      -p.26, Line 22, Male 'mating', not 'matting'.

      Corrected.

      -Throughout, leave space between number and unit (this is not required for degree or percent, but be consistent). 

      Corrected.

    1. Author Response:

      We thank the reviewers for their careful reading of the manuscript and for their comments. Generally, we agree with the reviewers on the strengths and weaknesses of our manuscript. It is true that this work is a first step towards understanding the molecular mechanisms underlying TNT formation, and that further biochemical and biophysical analyses will be necessary to elucidate CD9 and CD81 roles. It also provides a toolbox for the future identification of important TNT factors, and perhaps biological markers.

      However, we would like to better explain our choice of focusing on CD9 and CD81 in TNTs, given the fact that they are also expressed in EVPs. First, both were among the most abundant integral membrane proteins in TNTs, and overexpression of CD9 was previously shown to increase TNT number. However, a recent work directed by our coauthor E. Rubinstein clearly showed that the absence of CD9, CD81 or even both has minimal impact on the production or composition of EVs in MCF7 (Fan et al, Differential proteomics argues against a general role for CD9, CD81 or CD63 in the sorting of proteins into extracellular vesicles, J. Extracell Vesicles, 2023;12:12352. https://doi.org/10.1002/jev2.12352). This is in line with another recent publication (Tognoli, Commun biol 2023) and with our results showing that the concentration of EVPs was the same when CD9 was overexpressed, i.e. in conditions where TNT number and vesicle transfer were increased. Therefore, it is highly probable that the role of CD9 and CD81 in TNT vs. EVP formation is different, even if we cannot completely exclude a crosstalk between the two pathways.

      Regarding the importance of CD9 and CD81 in TNT formation, our results are consistent with a non-exclusive regulation of the TNTs by these tetraspanins, and/or with partial compensatory mechanisms occurring in the absence of them by yet unknown factors. Interestingly, to our knowledge, none of the TNT regulators described in the literature has a complete inhibitory effect when KO. These results confirm that several pathways can converge to regulate TNTs and are consistent with cellular plasticity. So it is hard to say whether factors like CD9 and CD81, which regulate TNTs and have other functions in cells, are “key” or simply “important”.

      Finally, the model we present in Figure 7 is a schematic working model of possible CD9/CD81 roles, which is obviously simplified for ease of understanding. It is important to note that when we write “no TNT” above an empty space between 2 cells, this describes what is drawn, and corresponds to real conditions where fewer TNTs are detected. It was never our intention to over-interpret our data, but rather to make it clearer with this diagram, and we hope that reading the article will make this clear.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript describes a series of experiments using human intracranial neural recordings designed to evaluate the processing of self-generated speech in the setting of feedback delays. Specifically, the authors aim to address the question about the relationship between speech-induced suppression and feedback sensitivity in the auditory cortex, whose relationship has been conflicting in the literature. They found a correlation between speech suppression and feedback delay sensitivity, suggesting a common process. Additional controls were done for possible forward suppression/adaptation, as well as controlling for other confounds due to amplification, etc.

      Strengths:

      The primary strength of the manuscript is the use of human intracranial recording, which is a valuable resource and gives better spatial and temporal resolution than many other approaches. The use of delayed auditory feedback is also novel and has seen less attention than other forms of shifted feedback during vocalization. Analyses are robust, and include demonstrating a scaling of neural activity with the degree of feedback delay, and more robust evidence for error encoding than simply using a single feedback perturbation.

      Weaknesses:

      Some of the analyses performed differ from those used in past work, which limits the ability to directly compare the results. Notably, past work has compared feedback effects between production and listening, which was not done here. There were also some unusual effects in the data, such as increased activity with no feedback delay when wearing headphones, that the authors attempted to control for with additional experiments, but remain unclear. Confounds by behavioral results of delayed feedback are also unclear.

      Overall the work is well done and clearly explained. The manuscript addresses an area of some controversy and does so in a rigorous fashion, namely the correlation between speech-induced suppression and feedback sensitivity (or lack thereof). While the data presented overlaps that collected and used for a previous paper, this is expected given the rare commodity these neural recordings represent. Contrasting these results to previous ones using pitch-shifted feedback should spawn additional discussion and research, including verification of the previous finding, looking at how the brain encodes feedback during speech over multiple acoustic dimensions, and how this information can be used in speech motor control.

      We thank the reviewer for their comments and have addressed the concerns point by point in the section “Recommendation for Authors”.

      Reviewer #2 (Public Review):

      Summary:

      "Speech-induced suppression and vocal feedback sensitivity in human cortex", Ozker and colleagues use intracranial EEG to understand audiomotor feedback during speech production using a speech production and delayed auditory feedback task. The purpose of the paper is to understand where and how speaker-induced suppression occurs, and whether this suppression might be related to feedback monitoring. First, they identified sites that showed auditory suppression during speech production using a single-word auditory repetition task and a visual reading task, then observed whether and how these electrodes show sensitivity to auditory feedback using a DAF paradigm. The stimuli were single words played auditorily or shown visually and repeated or read aloud by the participant. Neural data were recorded from regular- and high-density grids from the left and right hemispheres. The main findings were:

      • Speaker-induced suppression is strongest in the STG and MTG, and enhancement is generally seen in frontal/motor areas except for small regions of interest in the dorsal sensorimotor cortex and IFG, which can also show suppression.<br /> • Delayed auditory feedback, even when simultaneous, induces larger response amplitudes compared to the typical auditory word repetition and visual reading tasks. The authors presume this may be due to the effort and attention required to perform the DAF task.

      • The degree of speaker-induced suppression is correlated with sensitivity to delayed auditory feedback. • pSTG (behind TTS) is more strongly modulated by DAF than mid-anterior STG

      Strengths:

      Overall, I found the manuscript to be clear, the methodology and statistics to be solid, and the findings mostly quite robust. The large number of participants with high-density coverage over both the left and right lateral hemispheres allows for a greater dissection of the topography of speaker-induced suppression and changes due to audiomotor feedback. The tasks were well-designed and controlled for repetition suppression and other potential caveats.

      Weaknesses:

      (1) In Figure 1D, it would make more sense to align the results to the onset of articulation rather than the onset of the auditory or visual cue, since the point is to show that the responses during articulation are relatively similar. In this form, the more obvious difference is that there is an auditory response to the auditory stimulus, and none to the visual, which is expected, but not what I think the authors want to convey.

      We agree with the reviewer. We have updated Figure 1 accordingly.

      (2) The DAF paradigm includes playing auditory feedback at 0, 50, 100, and 200 ms lag, and it is expected that some of these lags are more likely to induce dysfluencies than others. It would be helpful to include some analysis of whether the degree of suppression or enhancement varies by performance on the task, since some participants may find some lags more interfering than others.

      We thank the reviewer for this suggestion. In the original analysis, we calculated a Sensitivity Index for each electrode by correlating the high gamma response with the delay condition across trials. To address the reviewer’s question, we now compared delay conditions in pairs (DAF0 vs DAF50, DAF0 vs DAF100, DAF0 vs DAF200, DAF50 vs DAF100, DAF50 vs DAF200 and DAF100 vs DAF200).

      Similar to our Suppression Index calculation, where we compared neural response to listening and speaking conditions (Listen-Speak/Listen+Speak), we now calculated the Sensitivity Index by comparing neural response to two delay conditions as follows:

      e.g.  Sensitivity Index = (DAF50 – DAF0) / (DAF50 + DAF0). We used the raw high gamma broadband signal power instead of percent signal change to ensure that the Sensitivity Index values varied between -1 to 1.

      As shown in the figure below, even when we break down the analysis by feedback delay, we still find a significant association between suppression and sensitivity (except for when we calculate sensitivity indices by comparing DAF50 and DAF100). Strongest correlation (Pearson’s correlation) was found when sensitivity indices were calculated by comparing DAF0 and DAF200.

      As the reviewer suggested, participants found DAF200 more interfering than the others and slowed down their speech the most (Articulation duration; DAF0: 0.698, DAF50: 0.726, DAF100: 0.737, and DAF200: 0.749 milliseconds; Ozker, Doyle et al. 2022).

      Author response image 1.

      (3) Figure 3 shows data from only two electrodes from one patient. An analysis of how amplitude changes as a function of the lag across all of the participants who performed this task would be helpful to see how replicable these patterns of activity are across patients. Is sensitivity to DAF always seen as a change in amplitude, or are there ever changes in latency as well? The analysis in Figure 4 gets at which electrodes are sensitive to DAF but does not give a sense of whether the temporal profile is similar to those shown in Figure 3.

      In Figure 4A, electrodes from all participants are color-coded to reflect the correlation between neural response amplitude and auditory feedback delay. A majority of auditory electrodes in the STG exhibit a positive correlation, indicating that response amplitude increases with increasing feedback delays. To demonstrate the replicability of the response patterns in Figure 3, here we show auditory responses averaged across 23 STG electrodes from 6 participants.

      Author response image 2.

      Response latency in auditory regions also increases with increasing auditory feedback delays. But this delayed auditory response to delayed auditory feedback is expected. In Figure 3, signals were aligned to the perceived auditory feedback onset, therefore we don’t see the latency differences. Below we replotted the same responses by aligning the signal to the onset of articulation. It is now clearer that responses are delayed as the auditory feedback delay increases. This is because participants start speaking at time=0, but they hear their voice with a lag so the response onset in these auditory regions are delayed.

      According to models of speech production, when there is a mismatch between expected and perceived auditory feedback, the auditory cortex encodes this mismatch with an enhanced response, reflecting an error signal. Therefore, we referred to changes in response amplitude as a measure of sensitivity to DAF.

      (4) While the sensitivity index helps to show whether increasing amounts of feedback delay are correlated with increased response enhancement, it is not sensitive to nonlinear changes as a function of feedback delay, and it is not clear from Figure 3 or 4 whether such relationships exist. A deeper investigation into the response types observed during DAF would help to clarify whether this is truly a linear relationship, dependent on behavioral errors, or something else.

      We compared responses to delay conditions in pairs in the analysis presented above (response #2). We hope these new results also clarifies this issue and address the reviewer’s concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) While the correlation between SuppI and SensI is clear here (as opposed to Chang et al), it is unclear if this difference is a byproduct of how SensI was calculated (and not just different tasks). In that paper, the feedback sensitivity was calculated as a metric comparing feedback responses during production and listening, whereas here the SensI is a correlation coefficient during production only. If the data exists, it would be very helpful to also show an analysis similar to that used previously (i.e. comparing DAF effects in both production and playback, either in correlations or just the 200ms delay response). One could imagine that some differences are due to sensory properties, though it is certainly less clear what delay effects would be on listening compared to say pitch shift.

      We thank the reviewer for pointing this out. Indeed, the calculation of SensI is different in the two studies. In Chang et al. study, SensI was calculated by comparing perturbed feedback responses during production and passive listening. This is a very meticulous approach as it controls for the acoustic properties of the auditory stimuli under both conditions.

      In our study, we didn’t have a passive listening condition. This would require recording the participants’ voice as they were speaking with DAF and playing it back to them in a subsequent passive listening condition. Therefore, we can’t completely eliminate the possibility that some differences are due to sensory properties. However, to address the reviewer’s concern, we examined the voice recordings of 8 participants for acoustic differences. Specifically, we compared voice intensities for different auditory feedback delays (0,50,100 and 200ms) and found no significant differences (F=0, p=0.091).

      We think that the difference with the Chang et al. study is an important point to emphasize, therefore we now added in the Discussion:

      “In contrast, to replicate this finding in humans, a previous iEEG study by Chang et al. (Chang, Niziolek et al. 2013) used frequency-shifted feedback during vowel production and found that most suppressed auditory sites did not overlap with those sensitive to feedback alterations. Using DAF instead of frequency-shifted feedback, we demonstrated a significant overlap of two neural populations in the STG, along with a strong correlation between the degree of speech-induced suppression and sensitivity to auditory feedback. This discrepancy may be due to different methods of calculating sensitivity to altered feedback. In our study, sensitivity was determined by comparing responses to delayed and non-delayed feedback during production, whereas Chang et al. compared perturbed feedback responses during production and listening. One possibility is that our approach identifies a larger auditory neural population in the STG sensitive to altered feedback. Alternatively, it could indicate a larger population highly sensitive to temporal rather than spectral perturbations in auditory feedback. Thus, we observe a wide overlap of the two neural populations in the STG showing both speech-induced suppression and sensitivity to auditory feedback. Replaying a recording of the participants' own delayed voice back to them, which we were unable to complete in this study, would have made the results of the two studies more comparable while also completely eliminating the possibility of a sensory explanation for the observed response enhancement.”

      (2) I am still a bit unclear on how Experiment 4 is different than the no-delay condition in Experiment 3. Please clarify. Also, to be clear, in Experiments 1+2 the subjects were not wearing any headphones and had no additional sidetone?

      It is correct that participants were not wearing earphones in Experiments 1&2 (with no additional sidetone), and that they were wearing earphones in Experiments 3&4.

      For the “no delay” condition in the DAF experiment (Experiment 3), participants were wearing earphones and reading words with simultaneous auditory feedback. So, this condition was equivalent to visual word reading (Experiment 2), except participants were wearing earphones. Yet, neural responses were much larger for the “no delay” condition in the DAF experiment compared to visual word reading.

      We suspected that larger neural responses in the DAF experiment were caused by hearing auditory feedback through earphones. To test and control for this possibility, in a subset of participants, we ran an additional visual word reading experiment (Experiment 4) with earphones and used the same volume settings as in the DAF experiment. We found that response magnitudes were now similar in the two experiments (Experiment 3 and 4) and earphones (with the associated increased sound amplitude) were indeed the reason for larger neural responses. Thus, Experiment 4 differs from the no-delay condition in Experiment 3 only in the stimuli read aloud.

      (3) In Figure 3, why is the DAF200 condition activity so much bigger than the other conditions, even prior to the DAF onset? I worry this might bias the rest of the response differences.

      In Figure 3B and 3D, time=0 indicates the onset of the perceived auditory feedback. Below we replotted the responses in the same two electrodes but now time=0 indicates the onset of articulation. We see that the peaking time of the responses are delayed as the auditory feedback delay increases. This is because participants start speaking at time=0, but they hear their voice with a lag so the response onset in these auditory regions are delayed. However, like the reviewer pointed out, the response for the DAF200 condition in Electrode G54 is slightly larger even at the very beginning. We think that this small, early response might reflect a response to the bone-conducted auditory feedback, which might be more prominent for the DAF200 condition. Nevertheless, we still see that response amplitude increase with increasing feedback delays in Electrode 63.

      (4) Figure 4C, are the labeled recording sites limited to those with significant DAF and/or suppression?

      In Figure 4C, we show electrodes that had significant high-gamma broadband responses during all tasks. We write in the Methods: “Electrodes that showed significant response increase (p < 10−4) either before (−0.5 to 0 s) or after speech onset (0 to 0.5 s) with respect to a baseline period (−1 to −0.6 s) and at the same time had a large signal-to-noise ratio (μ/σ > 0.7) during either of these time windows were selected. Electrode selection was first performed for each task separately, then electrodes that were commonly selected were further analyzed.”

      (5) Were there any analyses done to control for the effects of vocal changes on the DAF neural responses? The authors' previous paper did note a behavioral effect. This is probably not trivial, as we may not know the 'onset time' of the response, in contrast to pitch shift where it is more regular. If the timing is unknown, one thing that could be tried is to only look early in DAF responses (first 50ms say) to make sure the DAF effects hold.

      DAF involves two different perturbations: the absence of feedback at speech onset and the introduction of delayed feedback during playback. The timing of the behavioral effect in response to these two perturbations remains unclear. Aligning the neural responses to the production onset and examining the first 50ms would only capture the response to the acoustic feedback for the no-delay condition within that time window. Conversely, aligning the responses to the playback onset might miss the onset of the behavioral effect, which likely starts earlier as a response to the lack of feedback. We acknowledge the reviewer's point that this is a limitation of the DAF paradigm, and the behavioral effect is not as straightforward as that of pitch perturbation. However, we believe there is no clear solution to this issue.

      Minor points:

      (1) Figure 3, it might be nice to show the SuppI and SensI on the plots to give the reader a better sense of what those values look like.

      We included SuppI and SensI values in the new version of Figure 3.

      Reviewer #2 (Recommendations For The Authors):

      Minor Comments:

      (1) In Figure 1, it is unclear whether the responses shown in B-D correspond to the ROIs shown in Figure A - I am guessing so, but the alignment of the labels makes this slightly unclear, so I suggest these be relabeled somehow for clarity.

      This is fixed in the updated version of Figure 1.

      (2) In Figure 1D the difference in colors between AWR and VWR is difficult to appreciate - I suggest using two contrasting colors.

      This is fixed in the updated version of Figure 1.

      (3) Please add y-axis labels for Fig 3B-D. (I believe these are % signal change, but it would be clearer if the label were included).

      This is fixed in the updated version of Figure 3.

      (4) Can the authors comment on whether the use of speakers for AWR and VWR versus earphones for DAF and VWF- AF may have had an influence on the increased response in this condition? If the AWR were rerun using the headphone setup, or if DAF with 0 ms feedback were run with no other trials including lags, would the large differences in response amplitude be observed?

      Participants were not wearing earphones in Experiments 1&2, and that they were wearing earphones in Experiments 3&4.

      For the “no delay” condition in the DAF experiment (Experiment 3), participants were wearing earphones and reading words with simultaneous auditory feedback. So, this condition was equivalent to VWR (Experiment 2), except participants were wearing earphones. Yet, neural responses were much larger for the “no delay” condition in the DAF experiment compared to VWR.

      Supporting the reviewer’s concerns, we suspected that larger neural responses in the DAF experiment were caused by hearing auditory feedback through earphones. To test and control for this possibility, in a subset of participants, we ran the VWR-AF experiment (Experiment 4) with earphones and used the same volume settings as in the DAF experiment. We found that response magnitudes were now similar in the two experiments (Experiment 3 and 4) and earphones were indeed the reason for larger neural responses.

      (5) No data or code were available, I did not see any statement about this nor any github link or OSF link to share their data and/or code.

      Data is available in the Github repository: flinkerlab/Sensitivity-Suppression

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      We thank the reviewer for the time and effort in reviewing our revised manuscript and are grateful for their constructive comments and for acknowledging the significance of our work.

      Summary: 

      Their findings elucidate the mechanisms underlying 2-AA-mediated reduction of pyruvate transport into mitochondria, which impairs the interaction between ERRα and PGC1α, consequently suppressing MPC1 expression and reducing ATP production in tolerized macrophages. While the data presented is intriguing and the paper is well-written, there are several points that warrant consideration. The authors should enhance the clarity, relevance, and impact of their study. 

      Strengths: 

      This paper presents a novel discovery regarding the mechanisms through which PA regulates the bioenergetics of tolerized macrophages. 

      Weaknesses: 

      The relevance of the in vivo model to support the conclusions is questionable. Further clarification is needed on this point. 

      We appreciate the reviewer’s comment. Our conclusion that 2-AA decreases bioenergetics while sustains bacterial burden is further supported by additional in vivo data we present now in Fig. S5. To strengthen the relevance of our in vivo data, we performed additional in vivo experiments. In this set of in vivo studies, mice received the first exposure to 2-AA by injecting 2-AA only and the 2nd exposure through infection with PA14 or ΔmvfR four days post-2-AA injection.  As shown in the supplementary Figure S5 the levels of ATP and acetyl-CoA in the spleen of infected animals and the enumeration of the bacterial counts were the similar between PA14 or ΔmvfR receiving the 1st 2-AA exposure and agree with the “one-shot infection” findings presented in Figure 5 with the PA14 or ΔmvfR+2-AA infected mice or those receiving 2-AA only. These results are consistent with our previous findings showing that 2-AA impedes the clearance of PA14 (Bandyopadhaya et al. 2012; Bandyopadhaya et al. 2016; Tzika et al. 2013) and provide compelling evidence that the metabolic alterations identified may favor PA persistence in infected tissues.

      Reviewer #2 (Public Review): 

      We thank the reviewer for the time and effort in reviewing our revised manuscript and are grateful for their constructive comments and for acknowledging the significance of our work.

      Summary: 

      The study tries to connect energy metabolism with immune tolerance during bacterial infection. The mechanism details the role of pyruvate transporter expression via ERRalpha-PGC1 axis, resulting in pro-inflammatory TNF alpha signalling responsible for acquired infection tolerance. 

      Strengths: 

      Overall, the study is an excellent addition to the role of energy metabolism during bacterial infection. The mechanism-based approach in dissecting the roles of metabolic coactivator, transcription factor, mitochondrial transporter, and pro-inflammatory cytokine during acquired tolerance towards infections indicates a detailed and well-written study. The in vivo studies in mice nicely corroborate with the cell line-based data, indicating the requirement for further studies in human infections with another bacterial model system. 

      Weaknesses:

      The authors have involved various mechanisms to justify their findings. However, they have missed out on certain aspects which connect the mechanism throughout the paper. For example, they measured ATP and acetyl COA production linked with bacterial re-exposures and added various targets like MCP1, EER alpha, PGC1 alpha, and TNF alpha. However, they skipped PGC1 alpha levels, ATP and acetyl COA in various parts of the paper. Including the details would make the work more comprehensive. 

      We appreciate the reviewer’s comments and apologize for omitting the PGC-1α levels.  Per the reviewer’s suggestion, we have added the PGC-1α transcript levels (Figure 4C) in the section describing 2-AA-mediated dysregulation of the ERRα and MPC1 transcription (lines 243-252). Moreover, we have added Figure S5, which shows additional ATP and acetyl CoA levels in vivo. In our view, ATP and acetyl-CoA levels are shown in all appropriate settings, interrogating the bioenergetics, including in the presence of bacteria and in their absence, where only 2-AA is added. Please see Figures 1 and 5 and the newly added Figure S5.

      The use of public data sets to support their claim on immune tolerance is missing. Including various data sets of similar studies will strengthen the findings independently. 

      Suppose we understand correctly the reviewer’s comment regarding public data sets on immune tolerance. In that case, we are referring to our data since there are no published data from other groups on 2-AA tolerization and because the outcome of the 2-AA effect on the bacterial burden differs from that of LPS. Therefore, this study did not consider comparing with published data from LPS.

      Reviewer #1 (Recommendations For The Authors): 

      (1) Animal model: The authors appropriately initiated the study with an in vitro tolerization model involving 2-AA re-exposure, providing foundational insights for further investigation. However, the rationale for the one-shot injection in the in vivo model lacks clarity. To strengthen the relevance of the in vivo data, the authors should consider establishing a model involving bacterial re-exposure, such as a two-challenge paradigm with antibiotic treatment in between. This approach would allow for the examination of peritoneal macrophages harvested from mice, assessing ATP levels, acetyl CoA, TNF production, and bacterial counts. Such an approach would better align the in vivo findings with the in vitro experiments, confirming the role of tolerized macrophages in controlling PA infection in the presence of 2-AA. 

      We thank the reviewer for this comment.  Indeed, we have performed a similar two-challenge paradigm study in which first exposure to 2-AA is achieved by injecting 2-AA, and 2nd exposure through infection with PA14 or ΔmvfR four days post -2-AA injection.  The results of Figure S5 can be directly compared with those in Fig 5 in vivo studies. As shown in supplementary Figure S5 the levels of ATP and acetyl-CoA in the spleen of infected animals and the enumeration of the bacterial counts agree with the “one-shot infection” presented in Fig 5 (PA14 or ΔmvfR+2-AA).  Figure S5 study although not included initially to simplify data presentation, it was performed in parallel with Fig 5 and thus they can be directly compared. 

      (2) Exogenous ATP treatment: It is crucial to explore whether 2-AA re-exposure suppresses inflammasome activation and whether this suppression can be reversed by exogenous ATP treatment. Specifically, the authors should investigate whether NLRP3 inflammasome activation is inhibited in tolerized macrophages and whether such activation is necessary for host defense. Clarifying these points would provide valuable insights into the mechanisms underlying macrophage tolerization induced by 2-AA. 

      Excellent point. We agree, indeed, this is planned in the near future.

      (3) Figures 4C and D: The authors should exercise care in describing these figures. For instance, line 263 states that "UK5099 had no effect on the PA14 burden in macrophages," which requires correction for accuracy. 

      We apologize and rephrase this sentence and other sentences referring to Fig 4D and 4E in this section. Please see the highlighted sentences in the results section referring to Fig 4. For example, “The addition of the UK5099 inhibitor strongly enhanced the bacterial intracellular burden in ΔmvfR infected macrophages compared to the non-inhibited ΔmvfR infected cells, reaching a similar burden to those infected with PA14 (Fig. 4D)”.

      (4) ERRα expression: While the study intriguingly demonstrates a decrease in ERRα levels in tolerized macrophages following exposure to 2-AA, the discussion of this finding is lacking. It is worth exploring the possibility of increasing ERRα expression to counteract the tolerization induced by 2-AA and enhance clearance of PA infection. This avenue should be thoroughly discussed in the manuscript's Discussion section, offering insights into potential therapeutic strategies to mitigate the effects of 2-AA on macrophage function. 

      Thank you so much for this additional comment.  We have now included this point in the discussion section (lines 373-376).

      Reviewer #2 (Recommendations For The Authors): 

      Overall, the study is an excellent addition to the role of energy metabolism during bacterial infection. The mechanism-based approach in dissecting the roles of metabolic coactivator, transcription factor, mitochondrial transporter, and pro-inflammatory cytokine during acquired tolerance indicates a detailed and well-written study. However, connecting the mechanisms often was not reflected in some of the experiments, and answering a few concerns/suggestions will undoubtedly improve the study's readability, appeal, and overall impact on a broader audience. 

      (1) The authors should rephrase the title if possible. The title indicates 2AA as a bacterial quorum sensing signal; however, throughout the manuscript, there are no studies associated with actual quorum sensing in bacteria. 

      Thank you for this comment. However, the title indicates 2-AA as a quorum sensing molecule because the synthesis of this signaling molecule is uniquely regulated by quorum sensing. Because of its importance in the virulence of Pseudomonas aeruginosa and its regulation by quorum sensing, we feel that it is appropriate to refer to it as such.

      (2) The authors generalised immunotolerance and memory of 2AA-exposed cells to broad-spectrum microbial exposure by just testing with LPS exposure. I would suggest they test at least 2 more heterologous microbial products known to illicit response and confirm their claim from Figure 1. 

      We appreciate the reviewer’s comment. We intend not to generalize immunotolerance and memory of 2-AA exposed cells to broad-spectrum microbial exposure. Moreover, since the manuscript is not focused on comparing other bacterial molecules to 2-AA and multiple studies have focused on LPS tolerance, we tested LPS only in the manuscript.

      (3) LPS triggers ATP production through glycolysis in nitric oxide (NO) dependent mechanisms in various immune and non-immune cells. The authors should study the concentrations of NO, Glucose, and Pyruvate levels to clarify the mechanism of energy dynamics and the source of ATP and Acetyl CoA generated/scavenged during primary and secondary exposures to both 2AA and LPS. 

      We agree that a cross-tolerization experiment using 2-AA and LPS would reveal interesting insights into immune response during PA infections.  However, this is out of the scope of this article. Please notice that the mechanism of 2-AA and LPS tolerization is mechanistically distinct, e.g. they rely on different HDAC enzymes, and LPS tolerization predominantly involves changes in H3K27 acetylation (Lauterbach et al. 2019). In contrast, 2-AA tolerization involves H3K18 modifications (Bandyopadhaya, Tsurumi, and Rahme 2017). For this reason, the complexity of such interactions would require a comprehensive set of experiments that are not part of the focus of this study.

      (4) Immunogenic triggers often rapidly alter mitochondrial membrane potential, which alters oxygen consumption rates. However, the authors tend to generalize energy homeostasis and claim the deregulation of OXPHOS-inducing quiescent phenotype depending upon OCR measurements from Figure 1D. The authors must evaluate mitochondrial health and membrane potential during first and second exposure in a time-dependent manner to strengthen their theory of mitochondrial dysfunction. The authors should also check the phenomena in vivo (mice exposed to infection) if possible. 

      Thank you for this suggestion. We now include electron microscopy images of mitochondria isolated from macrophages exposed to 2-AA. Results revealed that 2-AA alters mitochondrial morphology and cristae, supporting the mitochondrial dysfunctionality caused by 2-AA. These results are shown in Figure S4 and lines 185-188.

      (5) Since both MCP1 and MCP2 transporters are known to transport pyruvate to mitochondria, checking both MCP1 and 2 at transcript and protein levels in exposed cells will be essential. I suggest authors use MCP inhibitors or use RNA interference against MCPs to check the effect on tolerance of the cells exposed for a second time. 

      To our understanding, mitochondrial pyruvate carrier proteins, MPC1 and MPC2, form a hetero-oligomeric complex in the inner mitochondrial membrane to facilitate pyruvate import into mitochondria (McCommis and Finck 2015). We also used UK5099 an MPC carrier inhibitor for enumeration of bacterial load in macrophages in Figure 4 and observed a similar effect as 2-AA suggesting a similar mechanism of action.

      (6) The pyruvate levels of mitochondria in Figure 2A are shallow, and the authors claim statistical significance within a 1.5-fold change. The authors should cross-check the number of mitochondria they are isolating while estimating pyruvate from only mitochondrial fractions. Another point is, correlating mitochondrial pyruvate with the burst of ATP during first exposure in comparison to second exposure, one can argue that the number of mitochondria is variable between the exposures leading to a change in pyruvate amount (mitochondria number increases to compensate for the first exposure and decreases quickly to maintain homeostasis and remains quiescent during a second exposure due to activation of compensatory immune mechanism towards primary exposure). How do authors address the issue? 

      Our electron microscopic studies indicate that although after 2-AA exposure, no reduction in mitochondrial numbers is observed in macrophages, alterations in mitochondrial morphology and cristae are observed. Please also see our answer to point # 4.

      (7) The authors claim that ERR alpha regulates MCP1 transcription via activation of ERRalpha-PGC1 alpha axis and tolerization in cells to second exposure is due to impairment of the axis (Figure 3). PGC1 alpha is known to be induced during various metabolic, physiological, and immune-challenge-related stress in a tissue-dependent manner. In this context, one should expect changes in transcript and protein levels of PGC1 alpha. The authors must study PGC1 alpha levels with time-dependent exposures. LPS was shown to induce oscillations in PGC1 alpha levels in a tissue-specific manner. In experiments, authors should verify if such oscillations persist during time-dependent exposure, emphasising mitochondrial uncoupling that might get dampened during re-exposures to microbial challenges. 

      We appreciate the suggestion. We have now included PGC-1α (Figure 4C) transcript levels, which show the same profile as the transcript levels of ERRα and MPC1. Please note that PGC-1α is only one of several ERRα co-activators; therefore, the amount of ERRα protein is the most relevant assessment regarding the activation of the MPC1 transcription.

      (8) The authors claim that ERRalpha induces MCP1 through ChIP data in Figure 3. However, the physical verifications at mRNA levels and mutational/inhibitor-based experiments are missing. The authors should study the alterations of MCP1 mRNA in relation to exposures and inhibitors of ERRalpha and PGC1 alpha to strengthen their work. 

      This is an interesting approach; however, this experiment exceeds the scope of our manuscript. We will certainly consider this suggestion in our future experiments. Thank you.

      (9) Publicly available data sets with LPS exposures should be analyzed for gene sets pertaining to mitochondrial OXPHOS, metabolism, immune response, etc. This will support the authors' work and provide a global overview of transcriptome associated with immune tolerance. 

      We appreciate the reviewer’s comment. For the reasons explained in #3 point and because the bacterial burden outcome of the 2-AA effect is different from that of LPS, comparison with LPS published data was not considered in this study.  We agree that in the future, a comprehensive comparison of whole genome transcriptome studies between LPS and 2-AA may reveal important insights that may also help better understand and potentially classify the immune tolerance triggered by 2-AA.

      (10) In Figure 4, the authors study the role of MCP1 and associated pyruvate-dependent bacterial clearance during tolerization and associate them with a decrease in TNF alpha. I would suggest the addition of an ERR alpha inhibitor in these experiments. It is not clear as to why (mechanism) TNF alpha transcription was affected via pyruvate transport during bacterial exposure. I would suggest that the authors clarify the mechanism of TNF alpha activation/inactivation and its association with energy metabolism during acquired tolerance. 

      This is an excellent suggestion, given that a similar effect of ERRα on TNF-α was observed by other researchers (Chaltel-Lima et al. 2023).  Here, to clarify the mechanism of TNF alpha activation/inactivation and its association with energy metabolism, we elaborate on this aspect in the discussion section.

      Lines 388-393. The text reads:

      Previously, we reported that 2-AA tolerization induces histone deacetylation via HDAC1, reducing H3K18ac at the TNF-α promoter (Bandyopadhaya et al. 2016). The findings with acetyl-CoA reduction, the primary substrate of histone acetylation, and the TNF-α transcription  using UK5099 and ATP in 2-AA treated macrophages are in support of the bioenergetics disturbances observed in macrophages and their link to epigenetic modifications we have shown to be promoted by 2-AA (Bandyopadhaya et al. 2016)

      (11) It is surprising that authors specifically target TNF alpha as a pro-inflammatory cytokine during tolerance. Various reports of cytokines and immune modulatory factors play a vital role in immune tolerance upon bacterial exposure. I would suggest authors perform cytokine profiling or check public data sets to specify their reason for choosing TNF alpha. 

      The choice of TNF-α is based on the results obtained in our previous study  (Bandyopadhaya et al. 2016).

      Bandyopadhaya, A., M. Kesarwani, Y. A. Que, J. He, K. Padfield, R. Tompkins, and L. G. Rahme. 2012. 'The quorum sensing volatile molecule 2-amino acetophenon modulates host immune responses in a manner that promotes life with unwanted guests', PLoS pathogens, 8: e1003024.

      Bandyopadhaya, A., A. Tsurumi, D. Maura, K. L. Jeffrey, and L. G. Rahme. 2016. 'A quorum-sensing signal promotes host tolerance training through HDAC1-mediated epigenetic reprogramming', Nat Microbiol, 1: 16174.

      Bandyopadhaya, A., A. Tsurumi, and L. G. Rahme. 2017. 'NF-kappaBp50 and HDAC1 Interaction Is Implicated in the Host Tolerance to Infection Mediated by the Bacterial Quorum Sensing Signal 2-Aminoacetophenone', Front Microbiol, 8: 1211.

      Chaltel-Lima, L., F. Domínguez, L. Domínguez-Ramírez, and P. Cortes-Hernandez. 2023. 'The Role of the Estrogen-Related Receptor Alpha (ERRa) in Hypoxia and Its Implications for Cancer Metabolism', Int J Mol Sci, 24.

      Lauterbach, M. A., J. E. Hanke, M. Serefidou, M. S. J. Mangan, C. C. Kolbe, T. Hess, M. Rothe, R. Kaiser, F. Hoss, J. Gehlen, G. Engels, M. Kreutzenbeck, S. V. Schmidt, A. Christ, A. Imhof, K. Hiller, and E. Latz. 2019. 'Toll-like Receptor Signaling Rewires Macrophage Metabolism and Promotes Histone Acetylation via ATP-Citrate Lyase', Immunity, 51: 997-1011 e7.

      McCommis, K. S., and B. N. Finck. 2015. 'Mitochondrial pyruvate transport: a historical perspective and future research directions', Biochem J, 466: 443-54.

      Tzika, A. A., C. Constantinou, A. Bandyopadhaya, N. Psychogios, S. Lee, M. Mindrinos, J. A. Martyn, R. G. Tompkins, and L. G. Rahme. 2013. 'A small volatile bacterial molecule triggers mitochondrial dysfunction in murine skeletal muscle', PloS one, 8: e74528.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This manuscript is a valuable study of the responses of GPi neurons to DBS stimulation in human PD and dystonia patients and it finds evidence for altered short-term and long-term plasticity in response to DBS between the two patient populations. This data set is of interest to both basic and clinical researchers working in the field of DBS and movement disorders. While there was enthusiasm for the potential significance of these findings, support for their conclusions was incomplete. Thir data may be indicative of more interesting and complex interpretations than currently considered in the article. 

      The authors would like to express their gratitude to the Editorial Team and Reviewers for their invaluable feedback which helped to improve the manuscript.

      Reviewer #1:

      Summary:

      Sumarac et al investigate differences in globus pallidus internus (GPi) spike activity and short- and long-term plasticity of direct pathway projections in patients with Parkinson's disease (PD) and dystonia. Their main claims are that GPi neurons exhibit distinct characteristics in these two disorders, with PD associated with specific power-frequency oscillations and dystonia showing lower firing rates, increased burstiness, and less regular activity. Additionally, long-term plasticity and synaptic depression appear to differ between the two conditions. The authors suggest that these findings support the concept of hyperfunctional GPi output in PD and hypofunctional output in dystonia, possibly driven by variations in the plasticity of striato-pallidal synapses. Overall enthusiasm is relatively high, but I think the discussion omits discussing findings that don't align well with standard models. 

      Strengths: 

      These types of studies are valuable as the data arise from patients who have dystonia or PD. This could provide unique insights into disease pathophysiology that might not be recapitulated in animal systems work. 

      Thank you for the positive feedback.

      Weaknesses: 

      - The rate model and indirect/direct pathway ideas lack explanatory power; too much of the hypothesis generation and discussion in this manuscript is set in the context of these old ideas. Their data in my view emphasize this somewhat emphatically. Most patients with the 'hypokinetic' movement disorder PD have dystonia as a part of their motor features. Dystonia is a form of excessive muscle activation that on the one hand is 'hyperkinetic' but on the other usually decreases the speed of motor tasks, even in patients with primary dystonia. Similarly, PD patients display a bewildering variety of hyperkinetic manifestations as well (rest tremor, dystonia, dyskinesia). If these are truly independent classifications, i.e. hyper- versus hypo-kinetic, the authors must acknowledge that there is considerable overlap in the spike activity across groups - numerous dystonia patients display higher discharge rates than the majority of the PD sample. Based on the firing rate alone, it would not be possible to distinguish these groups. 

      Thank you for your insightful comments regarding the discussion of the rate model and the distinction between hyperkinetic and hypokinetic movement disorders. We acknowledge that the rate model, primarily derived from limited number of animal subjects [1], may not fully encapsulate the complexities of Parkinson's disease (PD) and dystonia. Our study aimed to validate animal model findings in humans by correlating single-neuron features with disease symptom severity. However, we concur with the Reviewer’s comment regarding the overlapping motor features in hypokinetic and hyperkinetic disorders. We can speculate that the overlap in neuronal properties may be reflected in the overlap of, for example, hyperkinetic features being also present in PD, as suggested by the Reviewer. Per the Reviewer’s request, we have now acknowledged this notion in the manuscript. Interestingly, hypokinetic symptoms have been reported to occur in dystonia in response to GPi-stimulation and have been associated with beta activity in the LFP [2], which reinforces the notion that neural activity may be more related to specific symptoms rather than diseases as a whole. Supplementing our analyses, in addition to total UPDRSIII scores, we have now provided correlations with only hypokinetic (i.e. bradykinesia) subscores of the UPDRSIII to focus on more direct assessment of hypokinetic features in PD versus hyperkinetic features in dystonia. We have updated our methods and results accordingly.

      [1] M. R. DeLong, “Primate models of movement disorders of basal ganglia origin.,” Trends Neurosci, vol. 13, no. 7, pp. 281–285, Jul. 1990, doi: 10.1016/0166-2236(90)90110-v.

      [2] R. Lofredi et al., “Pallidal Beta Activity Is Linked to Stimulation-Induced Slowness in Dystonia,” Movement Disorders, vol. 38, no. 5, pp. 894–899, 2023, doi: 10.1002/mds.29347.

      Amendments to the manuscript:

      “Indeed, variability in spike firing rates in PD may be reflected in the considerable overlap in spiking activity between PD and dystonia (Fig. 1A), with many dystonia patients exhibiting higher discharge rates compared to PD patients.”

      “Given that UPDRSIII includes both hypokinetic and hyperkinetic symptoms of PD, we further sought to disaggregate the score by only considering items 23-26 in UPDRSIII, which assess hypokinetic symptoms of PD.”

      “… with a marginally stronger correlation for PD hypokinetic symptoms only (items 23-26 of UPDRSIII, Spearman's rho=0.32, p=.0330; Supplementary Fig. 3)”

      Supplementary Fig. 3: We provided correlations with hypokinetic (i.e., bradykinesia) subscore of the UPDRSIII. There is very little difference between correlation results of UPDRSIII total (Fig. 1) and the hypokinetic-only subscore (Supplementary Fig. 3).

      “though our results do not change substantially when only hypokinetic PD features are considered (Supplementary Fig. 3).”

      - If beta power is pathognomonic of parkinsonism, the authors found no differences in beta-related spike discharges across the groups. One would have predicted greater beta power in PD than in primary dystonia. This should be discussed explicitly and an interpretation should be provided. 

      We agree with the reviewer that considering the previous LFP literature, one might have expected a difference in single-neuron oscillation power between PD and dystonia. However, while prior studies [3], [4] have reported significant differences in oscillatory power between the two diseases, researchers examined local field potential (LFP) activity only. Other work [5] in non-human primates investigated single-neuron oscillations and reported no differences between PD and dystonia at the single-neuron level, in line with our findings. However, despite the lack of difference in overall power presented here, we provide evidence that the strength of the beta-frequency single-neuron oscillations nevertheless correlates with symptom severity in PD but not dystonia; whereas the strength of the theta-frequency single-neuron oscillations correlates with symptom severity in dystonia but not PD.

      [3] P. Silberstein et al., “Patterning of globus pallidus local field potentials differs between Parkinson’s disease and dystonia.,” Brain, vol. 126, no. Pt 12, pp. 2597–2608, Dec. 2003, doi: 10.1093/brain/awg267.

      [4] D. D. Wang et al., “Pallidal Deep-Brain Stimulation Disrupts Pallidal Beta Oscillations and Coherence with Primary Motor Cortex in Parkinson’s Disease,” J Neurosci, vol. 38, no. 19, pp. 4556–4568, May 2018, doi: 10.1523/JNEUROSCI.0431-18.2018.

      [5] P. A. Starr et al., “Spontaneous pallidal neuronal activity in human dystonia: comparison with Parkinson’s disease and normal macaque.,” J Neurophysiol, vol. 93, no. 6, pp. 3165–3176, Jun. 2005, doi: 10.1152/jn.00971.2004.

      Amendments to the manuscript:

      “Although previous research has reported differences in the LFP power between PD and dystonia [27,28], a study in non-human primates found no such differences in single-neuron oscillatory strength [8], as reflected in our findings. However, despite a lack of difference in overall power across disorders, we were able to derive disease/frequency-specific relationships with respect to clinical scores (Fig. 1C; oscillatory features).”

      - The study lacks a healthy control group, making it challenging to differentiate disease-specific findings from normal variations in GPi activity and plasticity. Although this is acknowledged in the discussion, this complicates the interpretation of the results. The sample sizes for PD and dystonia patients are relatively small, and the study combines various forms of dystonia, potentially masking subtype-specific differences. A larger and more homogenous sample could enhance the study's reliability.

      Indeed, intraoperative microelectrode recordings cannot be obtained in healthy individuals. We agree with the Reviewer that this limits the interpretation of the data. However, directly comparing clinical correlations with single neuron readouts between two distinct clinical entities may, to some degree, compensate for the lack of healthy control data. This contrast, while not providing a healthy control, is still able to point to disease-specific differences. This approach has previously been used to comparisons at the LFP level [6]. While the sample size is indeed small, it is comparable or even higher to similar studies that have investigated the relation of symptom severity of single neuron readouts [7]. The Reviewer is right in that we do not differentiate between generalized or cervical dystonia. We chose to do so because our subgroup analysis provided in the Supplementary Material did not suggest specific differences; though there is insufficient data from specific dystonia subtypes to make formal statistical comparisons. Indeed, future studies should investigate specific subtypes further.

      [6] R. Lofredi et al., “Pallidal beta bursts in Parkinson’s disease and dystonia,” Movement Disorders, vol. 34, no. 3, pp. 420–424, 2019, doi: 10.1002/mds.27524.

      [7] A. Gulberti et al., “Subthalamic and nigral neurons are differentially modulated during parkinsonian gait,” Brain, p. awad006, Feb. 2023, doi: 10.1093/brain/awad006.

      Amendments to the manuscript:

      “While we did not observe differences across dystonia subtypes (Supplementary Fig. 1), future studies in larger patient cohorts would are warranted. Finally, as many findings in Fig. 1 do not survive corrections for multiple comparisons, we suggest interpretation of results with caution. Despite this, many of our findings related to neuronal correlates are generally in line with previous literature, especially related to oscillatory correlates of PD and dystonia.”

      - While they mention that data are available on request, sharing data openly would increase transparency and allow for independent validation of the results. It is unclear how sharing deidentified data would compromise patient privacy or present ethical issues of any kind, as claimed by the authors. 

      Much of the data in question were collected under an old Research Ethics Board (REB) protocol which did not address data sharing. However, we have consulted with our REB and gained retroactive permission to post de-identified data which are now available in the Supplementary Material.

      Amendments to the manuscript:

      “The data that support the findings of this study are available in a public repository (see: https://osf.io/nqzd2/)”

      - They appropriately acknowledge several limitations, such as the inability to use pharmacological interventions and the need for further research in the chronic setting. 

      Thank you for the comment.

      - The manuscript highlights differences in GPi activity and plasticity between PD and dystonia but could provide more context on the clinical implications of these findings, particularly regarding what the implications would be novel paradigms for deep brain stimulation. 

      Thank you for the comment. Our finding that striato-pallidal plasticity decays more slowly in dystonia compared to PD may relate to the slower time course of symptom relief associated with GPi-DBS in dystonia, as presently outlined in the discussion. On the other hand, symptoms are also suppressed for longer after the cessation of stimulation in dystonia compared to PD, which may reflect long-term plastic changes [8], [9]. In the context of clinical DBS, plasticity modulation may be facilitated by intermittent stimulation algorithms that may achieve the necessary plastic network change by applying stimulation for a defined time but could then be switched off for improved energy consumption and perhaps as a means of mitigating side effects. DBS devices with chronic sensing may enable monitoring of evoked potential amplitudes for future adaptive stimulation applications; however, currently available devices are limited by low sampling rates, but future devices may overcome these technical limitations.

      [8] D. Ruge et al., “Deep brain stimulation effects in dystonia: time course of electrophysiological changes in early treatment.,” Mov Disord, vol. 26, no. 10, pp. 1913–1921, Aug. 2011, doi: 10.1002/mds.23731.

      [9] D. Ruge et al., “Shaping reversibility? Long-term deep brain stimulation in dystonia: the relationship between effects on electrophysiology and clinical symptoms.,” Brain, vol. 134, no. Pt 7, pp. 2106–2115, Jul. 2011, doi: 10.1093/brain/awr122.

      Amendments to the manuscript:

      “While further work is certainly required to better understand disease-related differences in plasticity, our findings may nevertheless motivate the development of periodic intermittent (ON/OFF) DBS strategies which periodically modulate synaptic plasticity for therapeutic benefits which outlast stimulation delivery, as have recently been employed in preclinical work [52,53].”

      - While statistical tests are mentioned, the manuscript could benefit from a more detailed presentation of statistical methods, including correction for multiple comparisons and effect sizes. Did the authors consider different recording sites within each patient as independent observations? I think this is not appropriate if that was the case. 

      Thank you for your constructive feedback. In response to the concerns regarding the statistical methods, we have expanded our analysis to provide a more comprehensive statistical overview. Specifically, we implemented the Bonferroni correction for multiple comparisons across each of the seven tests conducted for the differences in single-neuron features between PD and dystonia. The adjustment revealed that only the burst index and coefficient of variation retain statistical significance after post hoc correction, while the firing rate does not. Results of the Bonferroni corrections are now presented in Supplementary Table 3. Reflecting on the initial comment about firing rates between the two disorders, our updated findings underscore the limitation of using firing rates alone to differentiate between PD and dystonia, and instead, our analysis now points to burstiness and firing irregularity as more reliable discriminators. Regarding the clinical correlations, we refined our statistical analysis by employing nonparametric Monte Carlo permutation tests with 5000 permutations, as used in recent work [10], [11]. This method is chosen for its independence from assumptions regarding data distribution. Specifically, we computed and tested the Spearman rho for significance using the permutation test. Then, to address multiple comparisons, we controlled the false discovery rate (FDR) using the Benjamini-Hochberg procedure. Results of these comparisons are now presented in Supplementary Table 4. Lastly, to address the concern regarding recording site independence within patients, we updated our plasticity analysis methodology. In our study, 6 out of 18 patients had multiple recording sites. Thus, to account for this, we employed linear mixed models (LMM) with patient ID as a random factor to appropriately account for the non-independence of these observations.

      [10] v Lofredi et al., “Dopamine-dependent scaling of subthalamic gamma bursts with movement velocity in patients with Parkinson’s disease,” Elife, vol. 7, p. e31895, Feb. 2018, doi: 10.7554/eLife.31895.

      [11] R. Lofredi et al., “Subthalamic beta bursts correlate with dopamine-dependent motor symptoms in 106 Parkinson’s patients,” npj Parkinsons Dis., vol. 9, no. 1, Art. no. 1, Jan. 2023, doi: 10.1038/s41531-022-00443-3.

      Amendments to the manuscript:

      “For comparing differences in single-neuron features between PD and dystonia, significant results were followed up with post hoc multiple comparisons with a Bonferroni correction. For clinical correlations, non-parametric Monte Carlo permutation tests were used, avoiding assumptions about data distribution. The tested values were randomly shuffled 5,000 times to form a probability distribution, with the p-value reflecting the original sample rank. All tests underwent adjustment for multiple comparisons, controlling the false discovery rate (FDR) at an α-level of 0.05.”

      “analyzed using a linear mixed model (LMM) with patient ID as a random factor, normalized fEP amplitudes as the response variable, and epoch as a fixed effect”

      “using a LMM with patient ID as a random factor”

      “However, none of the clinical correlations survived Benjamini-Hochberg FDR-correction for multiple comparisons (Supplementary Table 4).”

      “In PD, fEP amplitudes were significantly greater after compared to before HFS (LMM; p = .0075, effect size = 5.42 ± 1.79; Fig. 2C), while in dystonia, the increase approached but did not reach statistical significance (LMM; p = .0708, effect size = 2.82 ± 1.45; Fig. 2C).”

      All statistics were updated in the results section and the figures.

      “Finally, as many findings in Fig. 1 do not survive corrections for multiple comparisons, we suggest interpretation of results with caution. Despite this, many of our findings related to neuronal correlates are generally in line with previous literature, especially related to oscillatory correlates of PD and dystonia.”

      - The manuscript could elaborate on the potential mechanisms underlying the observed differences in GPi activity and plasticity and their relevance to the pathophysiology of PD and dystonia. 

      Thank you for your feedback. We have enhanced the manuscript by integrating additional discussions on previous studies related to plasticity in dystonia and PD (e.g., [12], [13]), which highlight excessive plasticity in dystonia. Although these may appear contradictory to our findings of increased plasticity in PD compared to dystonia, we propose (also justified by previous literature) that chronic dopaminergic medication use may lead to synaptic over-sensitization, which has been hypothesized as a biological mechanism underlying levodopa-induced dyskinesias (a hyperkinetic feature) in PD [14].

      [12] Y. Tamura et al., “Disordered plasticity in the primary somatosensory cortex in focal hand dystonia.,” Brain, vol. 132, no. Pt 3, pp. 749–755, Mar. 2009, doi: 10.1093/brain/awn348.

      [13] D. A. Peterson, T. J. Sejnowski, and H. Poizner, “Convergent evidence for abnormal striatal synaptic plasticity in dystonia.,” Neurobiol Dis, vol. 37, no. 3, pp. 558–573, Mar. 2010, doi: 10.1016/j.nbd.2009.12.003.

      [14] P. Calabresi, B. Picconi, A. Tozzi, V. Ghiglieri, and M. Di Filippo, “Direct and indirect pathways of basal ganglia: a critical reappraisal.,” Nat Neurosci, vol. 17, no. 8, pp. 1022–1030, Aug. 2014, doi: 10.1038/nn.3743.

      Amendments to the manuscript:

      “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that long term potentiation effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the magnitude of direct pathway plasticity [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”

      Reviewer #2: 

      Summary: 

      The authors investigated how neuronal activity and metrics of plasticity using local electrical stimulation in the GPi were different between Parkinson's disease and dystonia patients. 

      Strengths: 

      The introduction highlights the importance of the work and the fundamental background needed to understand the rest of the paper. It also clearly lays out the novelty (i.e., that the dynamics of plastic effects in GPi between dystonia and PD have not been directly compared). 

      The methods are clearly described and the results are well organized in the figures. 

      The results are strong with measurements from a large population of patients for each disease group and with distinct findings for each group. 

      Thank you for the kind appraisal.

      Weaknesses: 

      The discussion was hard to follow in several places, making it difficult to fully appreciate how well the authors' claims and conclusions are justified by their data, mostly in relation to the plasticity results. It may help to summarize the relevant findings for each section first and then further expand on the interpretation, comparison with prior work, and broader significance. Currently, it is hard to follow each section without knowing which results are being discussed until the very end of the section. With the current wording in the "Neuronal correlates.." section, it is not always clear which results are from the current manuscript, and where the authors are referring to past work.

      Thank you for this feedback. The main findings are now summarized in a paragraph at the beginning of the Discussion section, before being discussed in comparison to other studies in the literature in subsequent sub-sections. Moreover, throughout the Discussion, findings from our study are now always reflected by a reference to the relevant figure to more easily differentiate current findings from previous literature. Additionally, Discussion sub-sections have been expanded to consider additional literature in response to various comments throughout the Review process (including the subsequent Review comment).

      Amendments to the manuscript:

      Paper findings are referenced to figures which depict the results at hand; discussion sub-sections expanded; and the following text has been added at the start of the Discussion:

      “In particular, we found that GPi neurons exhibited lower firing rates, but greater burstiness and variability in dystonia compared to PD (Fig. 1A). While no differences were found in the power of spiketrain oscillations across disorders (Fig. 1B), we found that PD symptom severity positively correlated with the power of low-beta frequency spiketrain oscillations, whereas dystonia symptom severity positively correlated with the power of theta frequency spiketrain oscillations (Fig. 1C). Dystonia symptom severity moreover correlated negatively with firing rate, and positively with neuronal variability. These results are discussed in greater detail with respect to previous literature in the subsequent Discussion section entitled “Neuronal correlates of PD and dystonia.” In response to electrical stimulation (protocol depicted in Fig. 2A), we found significant increases in the amplitudes of positive-going stimulation-evoked field potential amplitudes (considered to reflect striato-pallidal synaptic strength; as exemplified in Fig. 2B) before versus after HFS in both PD and dystonia (Fig. 2C); with recording sites in PD exhibiting significantly greater increases (Fig. 2D). While changes to evoked potential amplitude before versus after stimulation can be considered to be reflective of long-term plasticity [15,18], the dynamics of evoked potentials during HFS (as depicted in Fig. 2E) can be considered as reflective of short-term synaptic plasticity [18,21]. To this end, our findings are suggestive of faster latency synaptic depression in PD compared to dystonia (Fig. 2F/G). Plasticity findings are discussed in greater detail in the Discussion section entitled “Direct pathway plasticity.”

      Also, I felt that more discussion could be used to highlight the significance of the current results by comparing and/or contrasting them to prior relevant work and mechanisms. The novelty or impact is not very clear as written. Could this be further substantiated in the Discussion? 

      Thank you for the feedback. The discussion has been expanded to include additional literature that is relevant to the findings reported in the manuscript. For example, with regards to the neuronal correlates sub-section, we now highlight the important findings [15] that show changes to the discharge rates and oscillatory tendencies of GPi neurons in non-human primates in response to staged MPTP applications to progressively titrate motor severity; these results substantiate our lack of correlation with firing rates in PD, and presence of a clinical correlation with beta oscillations. We additionally now emphasize human studies that found LFP power difference between PD and dystonia [3], [4]; but simultaneously highlight studies that did not find such differences in spike-train oscillations (in non-human primates) [5], which is reflective of our own findings. With regards to our plasticity sub-section, we have added new content related to previous literature on plasticity in dystonia and PD (also addressed in response to a query from Reviewer #1). For example, we bring to light a variety of previous studies [12], [13] emphasizing excessive plasticity in dystonia. However, while such studies may seem to contradict our findings of greater plasticity in PD compared to dystonia, we additionally provide hypotheses (justified by previous literature) that prolonged used of dopaminergic medication may result in synaptic over-sensitization, thus giving rise to levodopa-induced dyskinesias (a hyperkinetic feature) in PD [14].

      [3] P. Silberstein et al., “Patterning of globus pallidus local field potentials differs between Parkinson’s disease and dystonia.,” Brain, vol. 126, no. Pt 12, pp. 2597–2608, Dec. 2003, doi: 10.1093/brain/awg267.

      [4] D. D. Wang et al., “Pallidal Deep-Brain Stimulation Disrupts Pallidal Beta Oscillations and Coherence with Primary Motor Cortex in Parkinson’s Disease,” J Neurosci, vol. 38, no. 19, pp. 4556–4568, May 2018, doi: 10.1523/JNEUROSCI.0431-18.2018.

      [5] P. A. Starr et al., “Spontaneous pallidal neuronal activity in human dystonia: comparison with Parkinson’s disease and normal macaque.,” J Neurophysiol, vol. 93, no. 6, pp. 3165–3176, Jun. 2005, doi: 10.1152/jn.00971.2004.

      [12] Y. Tamura et al., “Disordered plasticity in the primary somatosensory cortex in focal hand dystonia.,” Brain, vol. 132, no. Pt 3, pp. 749–755, Mar. 2009, doi: 10.1093/brain/awn348.

      [13] D. A. Peterson, T. J. Sejnowski, and H. Poizner, “Convergent evidence for abnormal striatal synaptic plasticity in dystonia.,” Neurobiol Dis, vol. 37, no. 3, pp. 558–573, Mar. 2010, doi: 10.1016/j.nbd.2009.12.003.

      [14] P. Calabresi, B. Picconi, A. Tozzi, V. Ghiglieri, and M. Di Filippo, “Direct and indirect pathways of basal ganglia: a critical reappraisal.,” Nat Neurosci, vol. 17, no. 8, pp. 1022–1030, Aug. 2014, doi: 10.1038/nn.3743.

      [15] A. Muralidharan et al., “Physiological changes in the pallidum in a progressive model of Parkinson’s disease: Are oscillations enough?,” Exp Neurol, vol. 279, pp. 187–196, May 2016, doi: 10.1016/j.expneurol.2016.03.002.

      Amendments to the manuscript:

      “Despite the lack of correlations with firing rate in PD, our findings seem to align with those of Muralidharan and colleagues [25], who showed that GPi neuronal firing rates may not directly correlate with motor severity but exhibit variability across the disease severity continuum in parkinsonian non-human primates (initially increasing, then decreasing, then increasing again at mild, moderate, and severe disease manifestations, respectively). Thus, while GPi discharge rates may change in PD, such changes may not be reflected by linear relationships with motor sign development and progression. Indeed, variability in spike firing rates in PD may be reflected in the considerable overlap in spiking activity between PD and dystonia (Fig. 1A), with many dystonia patients exhibiting higher discharge rates compared to PD patients. While differences in discharge rates were nevertheless observed between PD and dystonia, it may be that the combination of rate and pattern (reflected in the BI and CV) changes best differentiates the two disorders.”

      “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation (LTP) at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that LTP effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the amount of plasticity elicited in GPi [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”

      Some specific comments and questions about the Discussion: 

      Lines 209-211 - This sentence was hard to understand, could it be clarified? 

      Lines 211-213 - What do phasic and tonic components mean exactly? Could this be specifically defined? Are there specific timescales (as referred to in Intro)?

      Lines 215-217 - It's not clear what was delayed in dystonia, and how the authors are trying to contrast this with the faster time course in PD. I think some of this is explained in the introduction, but could also be re-summarized here as relevant to the results discussed. 

      Lines 223-224 - I'm not sure I follow the implication that network reorganization leads to delayed functional benefits. Could this be further elaborated? 

      Reply & Amendments to the manuscript: Thank you for your feedback. We've made the following concise revisions to address the comments:

      We've clarified lines 209-211 to explain that variations in electrical stimulation effects on pathways in PD and dystonia may reveal the operational mechanisms of DBS, despite a common target:

      “The variation in the modulation of these projections / pathways to electrical stimulation may also indicate the mechanism by which DBS operates across PD and dystonia, despite a common stimulation target.”

      In response to the second comment on lines 211-213 about phasic and tonic components, we now specify that phasic refers to dynamic muscle contractions, and tonic to continuous muscle contractions, providing clear definitions relevant to our context:

      “Clinical studies in dystonia have shown that DBS leads to a more rapid improvement in the transient, dynamic muscle contractions (phasic components) of the disorder when compared to the sustained, continuous muscle contractions (tonic or fixed components) [33]”

      For lines 215-217, we've refined our discussion to clearly contrast the delayed response in dystonia with the faster onset in PD:

      “This contrast with PD, where the, the maximal clinical response to DBS occurs within a much faster time course [13,36].”

      On lines 223-224, we've expanded the explanation of how network reorganization may lead to delayed functional benefits, highlighting adjustments in neural connectivity and synaptic efficacy in response to stimulation:

      “which involves adjustments in neural connectivity or synaptic efficacy in response to the stimulation [14,35].”

      Could the absence of a relationship between FR and disease in PD be discussed? 

      Thank you for raising this point. Despite observing higher firing rates in PD compared to dystonia, it is unexpected that these rates do not correlate with symptom severity according to the rate model of PD [1]. However, despite the lack of correlations with firing rates, our findings align with similar animal work of Muralidharan et al. [15], which reported that neuronal firing rates within the GPi of rhesus monkeys did not increase linearly with respect to varying intensities of parkinsonian motor severity. We did however show that low beta oscillatory strength within the GPi may play a significant role in the manifestation of motor symptoms in PD; which is also in line with findings of Muralidharan and colleagues. As per the Reviewer’s request, we have included this content into our discussion.

      [1] M. R. DeLong, “Primate models of movement disorders of basal ganglia origin.,” Trends Neurosci, vol. 13, no. 7, pp. 281–285, Jul. 1990, doi: 10.1016/0166-2236(90)90110-v.

      [15] A. Muralidharan et al., “Physiological changes in the pallidum in a progressive model of Parkinson’s disease: Are oscillations enough?,” Exp Neurol, vol. 279, pp. 187–196, May 2016, doi: 10.1016/j.expneurol.2016.03.002.

      Amendments to the manuscript:

      “Despite the lack of correlations with firing rate in PD, our findings seem to align with those of Muralidharan and colleagues [25], who showed that GPi neuronal firing rates may not directly correlate with motor severity but exhibit variability across the disease severity continuum in parkinsonian non-human primates (initially increasing, then decreasing, then increasing again at mild, moderate, and severe disease manifestations, respectively). Thus, while GPi discharge rates may change in PD, such changes may not be reflected by linear relationships with motor sign development and progression.”

      “Indeed, Muralidharan and colleagues [25] also showed linear group-level relationships between low-beta frequency spiketrain oscillations and disease severity in parkinsonian non-human primates, despite the lack of linear relationships with spike discharge rates (as discussed above).”

      It wasn't very clear how the direct pathway can be attributed to plasticity changes if the GPi makes up both the direct and indirect pathways. Could this be further clarified? 

      The reviewer brings up an important nuanced point. Recent work from our lab [16] shows that inhibitory evoked fields in STN (which receives inhibitory fields from GPe; no other inhibitory sources) are persistent with very minimal depression during HFS. On the other hand, inhibitory fields in the SNr (which receives majority of its inhibitory inputs from striatum; though some come by way of GPe as well per anatomical literature) depress quickly. We have previously also shown these rapidly depressing fields in GPi [17], [18], which also receives the majority of its inhibitory inputs via striatum, though some also from GPe. As such, the disaggregation of striatum-mediated versus GPe-mediated inhibitory fields is achieved based on: lack of rapidly depressing inhibitory evoked field potentials in STN (which receives inhibitory inputs via GPe and not striatum), but a common presence of rapidly depressing evoked field potentials in SNr and GPi (which both receive most of their inhibitory inputs from striatum); differences in the morphology of purportedly GPe- (fast latency) versus striatum-mediated (slow latency) evoked field potentials [16]; and the presence of slow latency caudato-nigral evoked field potentials in slices [19] that are reversed by GABA antagonist application [20]. These points are indeed outlined in the first paragraph of the Discussion sub-section “Direct pathway plasticity.” However, we have now additionally added a point to the Limitations that inhibitory inputs to the GPi also come by way of GPe, though in a lesser abundance.

      [16] L. A. Steiner et al., “Persistent synaptic inhibition of the subthalamic nucleus by high frequency stimulation,” Brain Stimul, vol. 15, no. 5, pp. 1223–1232, 2022, doi: 10.1016/j.brs.2022.08.020.

      [17] L. D. Liu, I. A. Prescott, J. O. Dostrovsky, M. Hodaie, A. M. Lozano, and W. D. Hutchison, “Frequency-dependent effects of electrical stimulation in the globus pallidus of dystonia patients.,” J Neurophysiol, vol. 108, no. 1, pp. 5–17, Jul. 2012, doi: 10.1152/jn.00527.2011.

      [18] L. Milosevic et al., “Modulation of inhibitory plasticity in basal ganglia output nuclei of patients with Parkinson’s disease,” Neurobiology of Disease, vol. 124, pp. 46–56, Apr. 2019, doi: 10.1016/j.nbd.2018.10.020.

      [19] M. Yoshida and W. Precht, “Monosynaptic inhibition of neurons of the substantia nigra by caudato-nigral fibers,” Brain Res, vol. 32, no. 1, pp. 225–228, Sep. 1971, doi: 10.1016/0006-8993(71)90170-3.

      [20] W. Precht and M. Yoshida, “Blockage of caudate-evoked inhibition of neurons in the substantia nigra by picrotoxin,” Brain Res, vol. 32, no. 1, pp. 229–233, Sep. 1971, doi: 10.1016/0006-8993(71)90171-5.

      Amendments to the manuscript:

      “Indeed, GPi receives the greatest abundance of inhibitory inputs from striatum (direct pathway), but also it also receives inhibitory inputs by way of GPe (indirect pathway). Although we can functionally disaggregate these pathway-specific responses based on differences in morphology and dynamics of GPe-mediated versus striatum-mediated inhibitory fEPs [21]; the possibility of compounded effects cannot be completely ruled out.”

      The mechanism of short- and long-term plasticity as applied in the protocols used in this work are outlined in reference to previous citations [15, 16, 18]. Because this is a central aspect of the current work and interpreting the results, it was difficult to appreciate how these protocols provide distinct metrics of short and long-term plasticity in GPi without some explanation of how it applies to the current work and the specific mechanisms. It would also help to be able to better link how the results fit with the broader conclusions. 

      Short-term plasticity is measured as the dynamic change to the fEP during ongoing HFS. For long-term plasticity analyses, the fEP amplitudes during LFS were compared pre- versus post-HFS. To make this analysis more intuitive we have added a protocol illustration to Fig 2. We have moreover greatly expanded the discussion to include more literature related to disease-specific differences in plasticity, and implications of modulating plasticity using DBS.

      Amendments to the manuscript:

      Added new panel to Fig 2

      Author response image 1.

      “Converging evidence from past animal and human studies suggests that dystonia is associated with impaired synaptic function and abnormal synaptic plasticity [35–37]. Compared to healthy controls, it has been shown that transcranial magnetic stimulation induced motor evoked potentials (MEPs) are hyperexcitable in dystonia [38,39], and somatosensory and motor cortical plasticity is greater [40]. Likewise, enhanced long-term potentiation at cortico-striatal synapses has been shown in rodent models of dystonia [41,42]. While our finding that long term potentiation effects are greater in PD compared to dystonia (Fig. 2D) is difficult to corroborate with this literature, one potential explanation can be that all of our PD patients are long-term users of levodopa. We have previously shown that the intake of this antiparkinsonian dopaminergic medication leads to potent increases in the amount of plasticity elicited in GPi [15]. Although patients are 12hr withdrawn form antiparkinsonian medications for surgery, it could be that striato-pallidal synapses are nevertheless chronically over-sensitized from prolonged use of dopaminergic medication; which is a well-known hypothesis related to the manifestation of levodopa-induced dyskinesias (a hyperkinetic feature) in PD [43]. Indeed, a lack of depotentiation of striato-pallidal projections has previously been observed in patients with levodopa-induced dyskinesias [44]. As such, excessive plasticity of these projections may corroborate hyperkinetic features of dystonia and levodopa-induced dyskinesias in PD.”

      In the Conclusion, it was difficult to understand the sentence about microcircuit interaction (line 232) and how it selectively modulates the efficacy of target synapses. Some further explanation here would be helpful. Also, it was not clear how these investigations (line 237) provide cellular-level support for closed-loop targeting. Could the reference to closed-loop targeting also be further explained? 

      We agree with the reviewer that the current wording may be confusing. We have changed the wording to be clearer. We have additionally added content related to closed-loop DBS based on chronic monitoring of evoked potential responses.

      Amendments to the manuscript:

      “Furthermore, chronic monitoring of evoked fields may allow for tracking of subcortical neuronal projections as indexed by inhibitory fields reported in this study. microcircuit interaction to selectively modulate the efficacy of target synapses.”

      future applications of DBS may also benefit from closed loop tuning of basal-ganglia-thalamo-cortical circuit dynamics and plasticity through chronic monitoring of evoked potential responses [56].

      How is the burst index calculated (Methods)? 

      Thank you for pointing out that the burst index definition was missing from the paper. It has now been added to the manuscript.

      Amendments to the manuscript:

      “The burst index was computed by taking the ratio of the means from a two-component Gaussian mixture model applied to the log interspike interval distribution, a modification of the previous mode-over-mean ISI method [20]”

      Figures and figure captions are missing some details:

      Fig. 1 - What does shading represent? 

      The shading in Fig. 1 illustrates results that were significant before adjustment for multiple comparisons.

      Amendments to the manuscript:

      “Depicted scatterplots are results that were significant before correction for multiple comparisons”

      Fig. 2 - Can the stimulation artifact be labeled so as not to be confused with the physiological signal? Is A representing the average of all patients or just one example? Are there confidence intervals for this data as it's not clear if the curves are significantly different or not (may not be important to show if just one example)? Same for D. What is being plotted in E? Is this the exponential fitted on data? Can this be stated in the figure citation directly so readers don't have to find it in the text, where it may not be directly obvious which figure the analyses are being applied towards? 

      Thank you for your comments regarding Fig. 2. We have made the following revisions to address the concerns:

      To clarify the presence of stimulation artifacts and differentiate them from the physiological signal, we have updated Panel B and E in the updated Fig. 2 which highlight the stimulation artifacts accordingly.

      Regarding the comment about Panel A (now B in the updated figure), it represents one single example per disease, rather than an average of all patients.

      In response to the comment about what is plotted in Panel E, we have revised the figure caption to explicitly state that it includes the exponential fit on the data.

      Amendments to the manuscript:

      Figure 2 panel B and E now highlight stimulation artifacts.

      Author response image 2.

      Author response image 3.

      The figure captions could use more details, that can be taken from the text, so that readers can understand figures without searching for relevant details across the paper. 

      Thank you for your feedback. We have revised the figure captions accordingly to provide more details.

      Amendments to the manuscript:

      “Fig 1 – GPi spiketrain feature analyses and clinical correlates of PD and dystonia. (A) With respect to (A) rate-based spiketrain features, firing rate was greater in PD while burst index (BI) and coefficient of variation (CV) were greater in dystonia; whereas no differences were found for (B) oscillatory spiketrain features for theta, alpha, low beta, high beta frequencies. MWU statistical results depicted are not corrected for multiple comparisons; after correction using the Bonferroni method, only CV and BI results remain significant (please see Supplementary Table 3). (C) In PD, the power of low beta spiketrain oscillations positively correlated (Spearman correlation) with symptom severity; in dystonia, neuronal firing rate negatively correlated with symptom severity, whereas CV and the power of theta spiketrain oscillations positively correlated with symptom severity. Depicted scatterplots are results that were significant before correction for multiple comparisons; however, none of the results persist after Benjamini-Hochberg correction for false discovery rate (please see Supplementary Table 4).”

      “Fig 2 – Long-term and short-term effects of HFS on striato-pallidal plasticity in PD and dystonia. (A) Schematic of the plasticity protocol to assess long-term plasticity via fEP amplitude comparisons pre- versus post-HFS and short-term plasticity via fEP dynamics during HFS. (B) Highlights example fEP traces for measuring long-term plasticity pre- versus post-HFS, with (C) displaying group-level fEP amplitudes pre- versus post-HFS across diseases. (D) Illustrates the amount of plasticity (i.e., percentage change in fEP amplitudes pre- versus post-HFS) in both PD and dystonia, with PD showing higher levels of plasticity. (E) Provides an example of fEP traces during HFS for assessing short-term plasticity, with (F) depicting group-level decay rates of fEP amplitudes using an exponential fit on the fEP amplitudes over the first 5 stimulus pulses across diseases. (G) Shows the half-life of the fitted exponential (i.e., rate of attenuation of fEP amplitudes) between PD and dystonia, with PD demonstrating faster fEP attenuation.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This study by Paoli et al. used a resonant scanning multiphoton microscope to examine olfactory representation in the projection neurons (PNs) of the honeybee with improved temporal resolution. PNs were classified into 9 groups based on their response patterns. Authors found that excitatory repose in the PNs precedes the inhibitory responses for ~40ms, and ~50% of PN responses contain inhibitory components. They built the neural circuit model of the mushroom body (MB) with evolutionally conserved features such as sparse representation, global inhibition, and a plasticity rule. This MB model fed with the experimental data could reproduce a number of phenomena observed in experiments using bees and other insects, including dynamical representations of odor onset and offset by different populations of Kenyon cells, prolonged representations of after-smell, different levels of odorspecificity for early/delay conditioning, and shift of behavioral timing in delay conditioning. The trace conditioning was not modeled and tested experimentally. Also, the experimental result itself is largely confirmatory to preceding studies using other organisms. Nonetheless, the experimental data and the model provide a solid basis for future studies.  

      We thank the reviewer for summarizing the value of our study and recognizing its generality and significance. As suggested, in a revised version of the manuscript, we will discuss the implication of our approach for the context of trace conditioning. The model we presented hinges on the learning-induced plasticity of KC-to-MBON synapses recruited during the learning window (i.e., the simulated US arrival). In the case of trace conditioning, the model predicts that the time of the behavioral response time should match the expected US arrival. Contrary to this prediction, preliminary analyses on empirical measurements of PER latency upon trace conditioning indicate this is not the case. In a revised version of the manuscript, we will discuss the differences between the predictions of the model and the experimental observations in a trace conditioning paradigm.

      Reviewer #2 (Public Review):

      The study presented by Paoli et al. explores temporal aspects of neuronal encoding of odors and their perception, using bees as a general model for insects. The neuronal encoding of the presence of an odor is not a static representation; rather, its neuronal representation is partly encoded by the temporal order in which parallel olfactory pathways participate and are combined. This aspect is not novel, and its relevance in odor encoding and recognition has been discussed for more than the past 20 years. 

      The temporal richness of the olfactory code and its significance have traditionally been driven by results obtained based on electrophysiological methods with temporal resolution, allowing the identification and timing of the action potentials in the different populations of neurons whose combination encodes the identity of an odor. On the other hand, optophysiological methods that enable spatial resolution and cell identification in odor coding lack the temporal resolution to appreciate the intricacies of olfactory code dynamics. 

      (1) In this context, the main merit of Paoli et al.'s work is achieving an optical recording that allows for spatial registration of olfactory codes with greater temporal detail than the classical method and, at the same time, with greater sensitivity to measure inhibitions as part of the olfactory code. 

      The work clearly demonstrates how the onset and offset of odor stimulation triggers a dynamic code at the level of the first interneurons of the olfactory system that changes at every moment as a natural consequence of the local inhibitory interactions within the first olfactory neuropil, the antennal lobe. This gives rise to the interesting theory that each combination of activated neurons along this temporal sequence corresponds to the perception of a different odor. The extent to which the corresponding postsynaptic layers integrate this temporal information to drive the perception of an odor, or whether this sequence is, in a sense, a journey through different perceptions, is challenging to address experimentally. 

      In their work, the authors propose a computational approach and olfactory learning experiments in bees to address these questions and evaluate whether the sequence of combinations drives a sequence of different perceptions. In my view, it is a highly inspiring piece of work that still leaves several questions unanswered. 

      We thank the reviewer for considering that our work has an inspiring nature. Below we have tried to answer the questions raised by the following comments, and we will include part of these answers in the revised version of our manuscript.

      (2) In my opinion, the detailed temporal profile of the response of projection neurons and their respective probabilities of occurrence provide valuable information for understanding odor coding at the level of neurons transferring information from the antennal lobes to the mushroom bodies. An analysis of these probabilities in each animal, rather than in the population of animals that were measured, would aid in better comprehending the encoding function of such temporal profiles. Being able to identify the involved glomeruli and understanding the extent to which the sequence of patterns and inhibitions is conserved for each odor across different animals, as it is well known for the initial excitatory burst of activity observed in previous studies without the fine temporal detail, would also be highly significant. 

      We thank the reviewer for recognizing the relevance of the findings in understanding the logic of olfactory coding. We agree about the importance of establishing if the different glomerular response profiles are evenly distributed across individuals or have individual biases. In the revised version of the manuscript, we will provide data on the distribution of response profiles for each animal and for different olfactory stimuli. Also, we fully agree on the importance of assessing to what extent such response profiles - largely determined by the local network of AL interneurons - are glomerulus-specific and conserved across individuals.

      In my view, the computational approach serves as a useful tool to inspire future experiments; however, it appears somewhat simplistic in tackling the complexity of the subject. One question that I believe the researchers do not address is to what extent the inhibitions recorded in the projection neurons are integrated by the Kenyon cells and are functional for generating odor-specific patterns at that level. 

      The model we proposed represents, indeed, a simplification of olfactory signal processing throughout the honey bee olfactory circuit. Still, it shows that simple but realistic rules can be sufficient to grasp some fundamental aspects of olfactory coding. However, we agree with the reviewer and believe that such a minimalistic model can provide a basis for designing future experiments in which complexity can be increased by adding relevant features, such as the learning-induced plasticity of PN-to-KC synapses or the divergence of multiple PNs from the same glomerulus to different KCs.

      Concerning the reviewer's question on the involvement of inhibitory inputs in generating odor-specific patterns at the level of the KCs, the short answer is yes, they contribute to the summed input of a target KC, thus to the odor representation. In designing the model, we considered that a given glomerulus provides maximal input at maximal excitation and minimal input (=0 input) at maximal inhibition. For this reason, an inhibited glomerulus contributes less (to KC action potential probability) than a glomerulus showing baseline activity. This, in turn, contributes less than an excited glomerulus. From the modeling point of view, normalizing the signal between 0 and 1 (i.e., setting minimal inhibition to 0 and maximal excitation to 1) would yield a similar result as with the current approach, where values range from -25% to +30% F/F. We implement the model's description to clarify this point.

      Lastly, the behavioral result indicating a difference in conditioned response latency after early or delayed learning protocol is interesting. However, it does not align with the expected time for the neuronal representation that was theoretically rewarded in the delayed protocol. This final result does not support the authors' interpretation regarding the existence of a smell and an after-smell as separate percepts that can serve as conditioned stimuli.

      Considering that our odor stimulus lasted 5 seconds, glomerular activity is highly variable at odor onset (i.e., within the first 1s) because of short excitatory response profiles and the delayed and slower onset of inhibitory responses. After the initial phase, the neural representation of the stimulus becomes more stable. Consequently, a neural signature learned in the case of delay conditioning, i.e., with the US appearing towards the end of the olfactory stimulation (t = 4 - 5s), may present itself much earlier (t = 1.5s), triggering a behavioral response that largely anticipates the expected US arrival time. 

      In the model, we observe an early decrease in action potential probability even in the case of delay conditioning. This occurs because the synapses recruited during the last second of olfactory stimulation (within the learning window during which CS and US overlap) become inactive. Because odorant-induced activity recruits highly overlapping synaptic populations between 1.5 and 5 s from the onset, a learning-induced inactivation of part of these synapses will result in a reduced action-potential probability in the modeled MBON. Importantly, this event will not be governed by time but by the appearance of the learned synaptic configuration. 

      We will add a new section to the revised version of the manuscript to clarify this concept and perform further analyses to characterize the contribution of different response types to the modeled response latency.

    1. Author response:

      The following is the response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Nitta et al, in their manuscript titled, "Drosophila model to clarify the pathological significance of OPA1 in autosomal dominant optic atrophy." The novelty of this paper lies in its use of human (hOPA1) to try to rescue the phenotype of an OPA1 +/- Drosophilia DOA model (dOPA). The authors then use this model to investigate the differences between dominant-negative and haploinsufficient OPA1 variants. The value of this paper lies in the study of DN/HI variants rather than the establishment of the drosophila model per se as this has existed for some time and does have some significant disadvantages compared to existing models, particularly in the extra-ocular phenotype which is common with some OPA1 variants but not in humans. I judge the findings of this paper to be valuable with regards to significance and solid with regards to the strength of the evidence.

      Suggestions for improvements:

      (1) Stylistically the results section appears to have significant discussion/conclusion/inferences in section with reference to existing literature. I feel that this information would be better placed in the separate discussion section. E.g. lines 149-154.

      We appreciate the reviewer’s suggestion to relocate the discussion, conclusions, and inferences, particularly those that reference existing literature, to a separate discussion section. For lines 149–154, we placed them in the discussion section (lines 343–347) as follows. “Our established fly model is the first simple organism to allow observation of degeneration of the retinal axons. The mitochondria in the axons showed fragmentation of mitochondria. Former studies have observed mitochondrial fragmentation in S2 cells (McQuibban et al., 2006), muscle tissue (Deng et al., 2008), segmental nerves (Trevisan et al., 2018), and ommatidia (Yarosh et al., 2008) due to the LOF of dOPA1.”

      For lines 178–181, we also placed them in the discussion section (lines 347–351) as follows. “Our study presents compelling evidence that dOPA1 knockdown instigates neuronal degeneration, characterized by a sequential deterioration at the axonal terminals and extending to the cell bodies. This degenerative pattern, commencing from the distal axons and progressing proximally towards the cell soma, aligns with the paradigm of 'dying-back' neuropathy, a phenomenon extensively documented in various neurodegenerative disorders (Wang et al., 2012). ”

      For lines 213–217, 218–220, and 222–223, we also placed them in the discussion section (lines 363– 391) as follows. “To elucidate the pathophysiological implications of mutations in the OPA1 gene, we engineered and expressed several human OPA1 variants, including the 2708-2711del mutation, associated with DOA, and the I382M mutation, located in the GTPase domain and linked to DOA. We also investigated the D438V and R445H mutations in the GTPase domain and correlated with the more severe DOA plus phenotype. The 2708-2711del mutation exhibited limited detectability via HA-tag probing. Still, it was undetectable with a myc tag, likely due to a frameshift event leading to the mutation's characteristic truncated protein product, as delineated in prior studies (Zanna et al., 2008). Contrastingly, the I382M, D438V, and R445H mutations demonstrated expression levels comparable to the WT hOPA1. However, the expression of these mutants in retinal axons did not restore the dOPA1 deficiency to the same extent as the WT hOPA1, as evidenced in Figure 5E. This finding indicates a functional impairment imparted by these mutations, aligning with established understanding (Zanna et al., 2008). Notably, while the 2708-2711del and I382M mutations exhibited limited functional rescue, the D438V and R445H mutations did not show significant rescue activity. This differential rescue efficiency suggests that the former mutations, particularly the I382M, categorized as a hypomorph (Del Dotto et al., 2018), may retain partial functional capacity, indicative of a LOF effect but with residual activity. The I382M missense mutation within the GTPase domain of OPA1 has been described as a mild hypomorph or a disease modifier. Intriguingly, this mutation alone does not induce significant clinical outcomes, as evidenced by multiple studies (Schaaf et al., 2011; Bonneau et al., 2014; Bonifert et al., 2014; Carelli et al., 2015). A significant reduction in protein levels has been observed in fibroblasts originating from patients harboring the I382M mutation. However, mitochondrial volume remains unaffected, and the fusion activity of mitochondria is only minimally influenced (Kane et al., 2017; Del Dotto et al., 2018). This observation is consistent with findings reported by de la Barca et al. in Human Molecular Genetics 2020, where a targeted metabolomics approach classified I382M as a mild hypomorph. In our current study, the I382M mutation preserves more OPA1 function compared to DN mutations, as depicted in Figures 5E and F. Considering the results from our Drosophila model and previous research, we hypothesize that the I382M mutation may constitute a mild hypomorphic variant. This might explain its failure to manifest a phenotype on its own, yet its contribution to increased severity when it occurs in compound heterozygosity.

      (2) I do think further investigation as to why a reduction of mitochondria was noticed in the knockdown. There are conflicting reports on this in the literature. My own experience of this is fairly uniform mitochondrial number in WT vs OPA1 variant lines but with an increased level of mitophagy presumably reflecting a greater turnover. There are a number of ways to quantify mitochondrial load e.g. mtDNA quantification, protein quantification for tom20/hsp60 or equivalent. I feel the reliance on ICC here is not enough to draw conclusions. Furthermore, mitophagy markers could be checked at the same time either at the transcript or protein level. I feel this is important as it helps validate the drosophila model as we already have a lot of experimental data about the number and function of mitochondria in OPA+/- human/mammalian cells.

      We thank the reviewer for the insightful comments and suggestions regarding our study on the impact of mitochondrial reduction in a knockdown model. We concur with the reviewer’s observation that our initial results did not definitively demonstrate a decrease in the number of mitochondria in retinal axons. Furthermore, we measured mitochondrial quantity by conducting western blotting using antiCOXII and found no reduction in mitochondrial content with the knockdown of dOPA1 (Figure S4A and B). Consequently, we have revised our manuscript to remove the statement “suggesting a decreased number of mitochondria in retinal axons. However, whether this decrease is due to degradation resulting from a decline in mitochondrial quality or axonal transport failure remains unclear.” Instead, we have refocused our conclusion to reflect our electron microscopy findings, which indicate reduced mitochondrial size and structural abnormalities. The reviewer’s observation of consistent mitochondrial numbers in WT versus mutant variant lines and elevated mitophagy levels prompted us to evaluate mitochondrial turnover as a significant factor in our study. Regarding verifying mitophagy markers, we incorporated the mito-QC marker in our experimental design. In our experiments, mito-QC was expressed in the retinal axons of Drosophila to assess mitophagy activity upon dOPA1 knockdown. We observed a notable increase in mCherry positive but GFP negative puncta signals one week after eclosion, indicating the activation of mitophagy (Figure 2D–H). This outcome strongly suggests that dOPA1 knockdown enhances mitophagy in our Drosophila model. The application of mito-QC as a quantitative marker for mitophagy, validated in previous studies, offers a robust approach to analyzing this process. Our findings elucidate the role of dOPA1 in mitochondrial dynamics and its implications for neuronal health. These results have been incorporated into Figure 2, with the corresponding text updated as follows (lines 159–167): “Given that an increase in mitophagy activity has been reported in mouse RGCs and nematode ADOA models (Zaninello et al., 2022; Zaninello et al., 2020), the mitoQC marker, an established indicator of mitophagy activity, was expressed in the photoreceptors of Drosophila. The mito-QC reporter consists of a tandem mCherry-GFP tag that localizes to the outer membrane of mitochondria (Lee et al., 2018). This construct allows the measurement of mitophagy by detecting an increase in the red-only mCherry signal when the GFP is degraded after mitochondria are transported to lysosomes. Post dOPA1 knockdown, we observed a significant elevation in mCherry positive and GFP negative puncta signals at one week, demonstrating an activation of mitophagy as a consequence of dOPA1 knockdown (Figure 2D–H).”  

      (3) Could the authors comment on the failure of the dOPA1 rescue to return their biomarker, axonal number to control levels. In Figure 4D is there significance between the control and rescue. Presumably so as there is between the mutant and rescue and the difference looks less.

      As the reviewer correctly pointed out, there is a significant difference between the control and rescue groups, which we have now included in the figure. Additionally, we have incorporated the following comments in the discussion section (lines 329–342) regarding this significant difference: “In our study, expressing dOPA1 in the retinal axons of dOPA1 mutants resulted in significant rescue, but it did not return to control levels. There are three possible explanations for this result. The first concerns gene expression levels. The Gal4-line used for the rescue experiments may not replicate the expression levels or timing of endogenous dOPA1. Considering that the optimal functionality of dOPA1 may be contingent upon specific gene expression levels, attaining a wild-type-like state necessitates the precise regulation of these expression levels. The second is a nonautonomous issue. Although dOPA1 gene expression was induced in the retinal axons for the rescue experiments, many retinal axons were homozygous mutants, while other cell types were heterozygous for the dOPA1 mutation. If there is a non-autonomous effect of dOPA1 in cells other than retinal axons, it might not be possible to restore the wild-type-like state fully. The third potential issue is that only one isoform of dOPA1 was expressed. In mouse OPA1, to completely restore mitochondrial network shape, an appropriate balance of at least two different isoforms, lOPA1 and s-OPA1, is required (Del Dotto et al., 2017). This requirement implies that multiple isoforms of dOPA1 are essential for the dynamic activities of mitochondria.”

      (4) The authors have chosen an interesting if complicated missense variant to study, namely the I382M with several studies showing this is insufficient to cause disease in isolation and appears in high frequency on gnomAD but appears to worsen the phenotype when it appears as a compound het. I think this is worth discussing in the context of the results, particularly with regard to the ability for this variant to partially rescue the dOPA1 model as shown in Figure 5.

      As the reviewer pointed out, the I382M mutation is known to act as a disease modifier. However, in our system, as suggested by Figure 5, I382M appears to retain more activity than DN mutations. Considering previous studies, we propose that I382M represents a mild hypomorph. Consequently, while I382M alone may not exhibit a phenotype, it could exacerbate severity in a compound heterozygous state. We have incorporated this perspective in our revised discussion (lines 375-391).

      “Notably, while the 2708-2711del and I382M mutations exhibited limited functional rescue, the D438V and R445H mutations did not show significant rescue activity. This differential rescue efficiency suggests that the former mutations, particularly the I382M, categorized as a hypomorph (Del Dotto et al., 2018), may retain partial functional capacity, indicative of a LOF effect but with residual activity. The I382M missense mutation within the GTPase domain of OPA1 has been described as a mild hypomorph or a disease modifier. Intriguingly, this mutation alone does no induce significant clinical outcomes, as evidenced by multiple studies (Schaaf et al., 2011; Bonneau et al., 2014; Bonifert et al., 2014; Carelli et al., 2015). A significant reduction in protein levels has been observed in fibroblasts originating from patients harboring the I382M mutation. However, mitochondrial volume remains unaffected, and the fusion activity of mitochondria is only minimally influenced (Kane et al., 2017; Del Dotto et al., 2018). This observation is consistent with findings reported by de la Barca et al. in Human Molecular Genetics 2020, where a targeted metabolomics approach classified I382M as a mild hypomorph. In our current study, the I382M mutation preserves more OPA1 function compared to DN mutations, as depicted in Figures 5E and F. Considering the results from our Drosophila model and previous research, we hypothesize that the I382M mutation may constitute a mild hypomorphic variant. This might explain its failure to manifest a phenotype on its own, yet its contribution to increased severity when it occurs in compound heterozygosity.”

      (5) I feel the main limitation of this paper is the reliance on axonal number as a biomarker for OPA1 function and ultimately rescue. I have concerns because a) this is not a well validated biomarker within the context of OPA1 variants b) we have little understanding of how this is affected by over/under expression and c) if it is a threshold effect e.g. once OPA1 levels reach <x% pathology develops but develops normally when opa1 expression is >x%. I think this is particularly relevant when the authors are using this model to make conclusions on dominant negativity/HI with the authors proposing that if expression of a hOPA1 transcript does not increase opa1 expression in a dOPA1 KO then this means that the variant is DN. The authors have used other biomarkers in parts of this manuscript e.g. ROS measurement and mito trafficking but I feel this would benefit from something else particularly in the latter experiments demonstrated in figure 5 and 6.

      The reviewer raised concerns regarding the adequacy of axonal count as a validated biomarker in the context of OPA1 mutants. In response, we corroborated its validity using markers such as MitoSOX, Atg8, and COXII. Experiments employing MitoSOX revealed that the augmented ROS signals resulting from dOPA1 knockdown were mitigated by expressing human OPA1. Conversely, the mutant variants 2708-2711del, D438V, and R445H did not ameliorate these effects, paralleling the phenotype of axonal degeneration observed. These findings are documented in Figure 5F, and we have incorporated the following text into section lines 248–254 of the results:

      “Furthermore, we assessed the potential for rescuing ROS signals. Similar to its effect on axonal degeneration, wild-type hOPA1 effectively mitigated the phenotype, whereas the 2708-2711del, D438V, and R445H mutants did not (Figure 5F). Importantly, the I382M variant also reduced ROS levels comparably to the wild type. These findings demonstrate that both axonal degeneration and the increase in ROS caused by dOPA1 downregulation can be effectively counteracted by hOPA1. Although I382M retains partial functionality, it acts as a relatively weak hypomorph in this experimental setup.”

      Moreover, utilizing mito-QC, we observed elevated mitophagy in our Drosophila model, with these results now included in Figure 2D–H. Given the complexity of the genetics involved and the challenges in establishing lines, autophagy activity was quantified by comparing the ratio of Atg8-1 to Atg8-2 via Western blot analysis. However, no significant alterations were detected across any of the genotypes. Additionally, mitochondrial protein levels derived from COXII confirmed consistent mitochondrial quantities, showing no considerable variance following knockdown. These insights affirm that retinal axon degeneration and mitophagy activation are present in the Drosophila DOA model, although the Western blot analysis revealed no significant changes in autophagy activation. Such findings necessitate caution as this model may not fully replicate the molecular pathology of the corresponding human disease. These Western blot findings are presented in Figure S4, with the following addition made to section lines 255–263 of the results:

      “We also conducted Western blot analyses using anti-COXII and anti-Atg8a antibodies to assess changes in mitochondrial quantity and autophagy activity following the knockdown of dOPA1. Mitochondrial protein levels, indicated by COXII quantification, were evaluated to verify mitochondrial content, and the ratio of Atg8a-1 to Atg8a-2 was used to measure autophagy activation. For these experiments, Tub-Gal4 was employed to systemically knockdown dOPA1. Considering the lethality of a whole-body dOPA1 knockdown, Tub-Gal80TS was utilized to repress the knockdown until eclosion by maintaining the flies at 20°C. After eclosion, we increased the temperature to 29°C for two weeks to induce the knockdown or expression of hOPA1 variants. The results revealed no significant differences across the genotypes tested (Figure S4A–D).”

      In assessing the effects of dominant negative mutations, measurements including ROS levels, the ratio of Atg8-1 to Atg8-2, and the quantity of COXII protein were conducted, yet no significant differences were observed (Figure S6). This limitation of the fly model is mentioned in the results, noting the observation of the axonal degeneration phenotype but not alterations in ROS signaling, autophagy activity, or mitochondrial quantity as follows (line 287–290):

      “We investigated the impacts of dominant negative mutations on mitochondrial oxidation levels, mitochondrial quantity, and autophagy activation levels; however, none of these parameters showed statistical significance (Figure S6).”

      The reviewer also inquired about the effects of overexpressing and underexpressing OPA1 on axonal count and whether these effects are subject to a threshold. In response, we expressed both wild-type and variant forms of human OPA1 in Drosophila in vivo and assessed their protein levels using Western blot analysis. The results showed no significant differences in expression levels between the wild-type and variant forms in the OPA1 overexpression experiments, suggesting the absence of a variation threshold effect. These findings have been newly documented as quantitative data in Figure 5C. Furthermore, we have included a statement in the results section for Figure 6A, clarifying that overexpression of hOPA1 exhibited no discernible impact, as detailed on lines 274–276.

      “The results presented in Figure 5C indicate that there are no significant differences in the expression levels among the variants, suggesting that variations in expression levels do not influence the outcomes.”

      (6) Could the authors clarify what exons in Figure 5 are included in their transcript. My understanding is transcript NM_015560.3 contains exon 4,4b but not 5b. According to Song 2007 this transcript produces invariably s-OPA1 as it contains the exon 4b cleavage site. If this is true, this is a critical limitation in this study and in my opinion significantly undermines the likelihood of the proposed explanation of the findings presented in Figure 6. The primarily functional location of OPA1 is at the IMM and l-OPA1 is the primary opa1 isoform probably only that localizes here as the additional AA act as a IMM anchor. Given this is where GTPase likely oligomerizes the expression of s-OPA1 only is unlikely to interact anyway with native protein. I am not aware of any evidence s-OPA1 is involved in oligomerization. Therefore I don't think this method and specifically expression of a hOPA1 transcript which only makes s-OPA1 to be a reliable indicator of dominant negativity/interference with WT protein function. This could be checked by blotting UAS-hOPA1 protein with a OPA1 antibody specific to human OPA1 only and not to dOPA1. There are several available on the market and if the authors see only s-OPA1 then it confirms they are not expressing l-OPA1 with their hOPA1 construct.

      As suggested by the reviewer, we performed a Western blot using a human OPA1 antibody to determine if the expressed hOPA1 was producing the l-OPA1 isoform, as shown in band 2 of Figure 5D. The results confirmed the presence of both l-OPA1 and what appears to be s-OPA1 in bands 2 and 4, respectively. These findings are documented in the updated Figure 5D, with a detailed description provided in the manuscript at lines 224-226. Additionally, the NM_015560.3 refers to isoform 1, which includes only exons 4 and 5, excluding exons 4b and 5b. This isoform can express both l-OPA1 and s-OPA1 (refer to Figure 1 in Song et al., J Cell Biol. 2007). We have updated the schematic diagram in the figure to include these exons. The formation of s-OPA1 through cleavage occurs at the OMA1 target site located in exon 5 and the Yme1L target site in exon 5b of OPA1. Isoform 1 of OPA1 is prone to cleavage by OMA1, but a homologous gene for OMA1 does not exist in Drosophila. Although a homologous gene for Yme1L is present in Drosophila, exon 5b is missing in isoform 1 of OPA1, leaving the origin of the smaller band resembling s-OPA1 unclear at this point.

      Reviewer #2 (Public Review):

      The data presented support and extend some previously published data using Drosophila as a model to unravel the cellular and genetic basis of human Autosomal dominant optic atrophy (DOA). In human, mutations in OPA1, a mitochondrial dynamin like GTPase (amongst others), are the most common cause for DOA. By using a Drosophila loss-of-function mutations, RNAi- mediated knockdown and overexpression, the authors could recapitulate some aspects of the disease phenotype, which could be rescued by the wild-type version of the human gene. Their assays allowed them to distinguish between mutations causing human DOA, affecting the optic system and supposed to be loss-of-function mutations, and those mutations supposed to act as dominant negative, resulting in DOA plus, in which other tissues/organs are affected as well. Based on the lack of information in the Materials and Methods section and in several figure legends, it was not in all cases possible to follow the conclusions of the authors.

      We appreciate the reviewer's constructive feedback and the emphasis on enhancing clarity in our manuscript. We recognize the concerns raised about the lack of detailed information in the Materials and Methods section and several figure legends, which may have obscured our conclusions. In response, we have appended the detailed genotypes of the Drosophila strains used in each experiment to a supplementary table. Additionally, we realized that the description of 'immunohistochemistry and imaging' was too brief, previously referenced simply as “immunohistochemistry was performed as described previously (Sugie et al., 2017).” We have now expanded this section to include comprehensive methodological details. Furthermore, we have revised the figure legends to provide clearer and more thorough descriptions.

      Similarly, how the knowledge gained could help to "inform early treatment decisions in patients with mutations in hOPA1" (line 38) cannot be followed.

      To address the reviewer's comments, we have refined our explanation of the clinical relevance of our findings as follows. We believe this revision succinctly articulates the practical application of our research, directly responding to the reviewer’s concerns about linking the study's outcomes to treatment decisions for patients with hOPA1 mutations. By underscoring the model’s value in differential diagnosis and its influence on initiating treatment strategies, we have clarified this connection explicitly, within the constraints of the abstract’s word limit. The revised sentence now reads: "This fly model aids in distinguishing DOA from DOA plus and guides initial hOPA1 mutation treatment strategies."

      Reviewer #3 (Public Review):

      Nitta et al. establish a fly model of autosomal dominant optic atrophy, of which hundreds of different OPA1 mutations are the cause with wide phenotypic variance. It has long been hypothesized that missense OPA1 mutations affecting the GTPase domain, which are associated with more severe optic atrophy and extra-ophthalmic neurologic conditions such as sensorineural hearing loss (DOA plus), impart their effects through a dominant negative mechanism, but no clear direct evidence for this exists particularly in an animal model. The authors execute a well-designed study to establish their model, demonstrating a clear mitochondrial phenotype with multiple clinical analogs including optic atrophy measured as axonal degeneration. They then show that hOPA1 mitigates optic atrophy with the same efficacy as dOPA1, setting up the utility of their model to test disease-causing hOPA1 variants. Finally, they leverage this model to provide the first direct evidence for a dominant negative mechanism for 2 mutations causing DOA plus by expressing these variants in the background of a full hOPA1 complement.

      Strengths of the paper include well-motivated objectives and hypotheses, overall solid design and execution, and a generally clear and thorough interpretation of their results. The results technically support their primary conclusions with caveats. The first is that both dOPA1 and hOPA1 fail to fully restore optic axonal integrity, yet the authors fail to acknowledge that this only constitutes a partial rescue, nor do they discuss how this fact might influence our interpretation of their subsequent results.

      As the reviewer rightly points out, neither dOPA1 nor hOPA1 achieve a complete recovery. Therefore, we acknowledge that this represents only a partial rescue and have added the following explanations regarding this partial rescue in the results and discussion sections.

      Result:

      Significantly —> partially (lines 207 and 228) Discussion (lines 329–342):

      In our study, expressing dOPA1 in the retinal axons of dOPA1 mutants resulted in significant rescue, but it did not return to control levels. There are three possible explanations for this result. The first concerns gene expression levels. The Gal4-line used for the rescue experiments may not replicate the expression levels or timing of endogenous dOPA1. Considering that the optimal functionality of dOPA1 may be contingent upon specific gene expression levels, attaining a wild-type-like state necessitates the precise regulation of these expression levels. The second is a non-autonomous issue. Although dOPA1 gene expression was induced in the retinal axons for the rescue experiments, many retinal axons were homozygous mutants, while other cell types were heterozygous for the dOPA1 mutation. If there is a non-autonomous effect of dOPA1 in cells other than retinal axons, it might not be possible to restore the wild-type-like state fully. The third potential issue is that only one isoform of dOPA1 was expressed. In mouse OPA1, to completely restore mitochondrial network shape, an appropriate balance of at least two different isoforms, l-OPA1 and s-OPA1, is required (Del Dotto et al., 2017). This requirement implies that multiple isoforms of dOPA1 are essential for the dynamic activities of mitochondria.

      The second caveat is that their effect sizes are small. Statistically, the results indeed support a dominant negative effect of DOA plus-associated variants, yet the data show a marginal impact on axonal degeneration for these variants. The authors might have considered exploring the impact of these variants on other mitochondrial outcome measures they established earlier on. They might also consider providing some functional context for this marginal difference in axonal optic nerve degeneration.

      In response to the reviewer’s comment regarding the modest effect sizes observed, we acknowledge that the magnitude of the reported changes is indeed small. To explore the impact of these variants on additional mitochondrial outcomes as suggested, we employed markers such as MitoSOX, Atg8, and COXII for validation. However, we could not detect any significant effects of the DOA plus-associated variants using these methods. We apologize for the redundancy, but to address Reviewer #1's fifth question, we present experimental results showing that while the increased ROS signals observed upon dOPA1 knockdown were rescued by expressing human OPA1, the mutant variants 2708-2711del, D438V, and R445H did not ameliorate this effect. This outcome mirrors the axonal degeneration phenotype and is documented in Figure 5F. The following text has been added to the results section lines 248–254:

      “Furthermore, we assessed the potential for rescuing ROS signals. Similar to its effect on axonal degeneration, wild-type hOPA1 effectively mitigated the phenotype, whereas the 2708-2711del, D438V, and R445H mutants did not (Figure 5F). Importantly, the I382M variant also reduced ROS levels comparably to the wild type. These findings demonstrate that both axonal degeneration and the increase in ROS caused by dOPA1 downregulation can be effectively counteracted by hOPA1. Although I382M retains partial functionality, it acts as a relatively weak hypomorph in this experimental setup.”

      Moreover, utilizing mito-QC, we observed elevated mitophagy in our Drosophila model, with these results now included in Figure 2D–H. Given the complexity of the genetics involved and the challenges in establishing lines, autophagy activity was quantified by comparing the ratio of Atg8-1 to Atg8-2 via Western blot analysis. However, no significant alterations were detected across any of the genotypes. Additionally, mitochondrial protein levels derived from COXII confirmed consistent mitochondrial quantities, showing no considerable variance following knockdown. These insights affirm that retinal axon degeneration and mitophagy activation are present in the Drosophila DOA model, although the Western blot analysis revealed no significant changes in autophagy activation. Such findings necessitate caution as this model may not fully replicate the molecular pathology of the corresponding human disease. These Western blot findings are presented in Figure S4, with the following addition made to section lines 255–263 of the results:

      “We also conducted Western blot analyses using anti-COXII and anti-Atg8a antibodies to assess changes in mitochondrial quantity and autophagy activity following the knockdown of dOPA1. Mitochondrial protein levels, indicated by COXII quantification, were evaluated to verify mitochondrial content, and the ratio of Atg8a-1 to Atg8a-2 was used to measure autophagy activation. For these experiments, Tub-Gal4 was employed to systemically knockdown dOPA1. Considering the lethality of a whole-body dOPA1 knockdown, Tub-Gal80TS was utilized to repress the knockdown until eclosion by maintaining the flies at 20°C. After eclosion, we increased the temperature to 29°C for two weeks to induce the knockdown or expression of hOPA1 variants. The results revealed no significant differences across the genotypes tested (Figure S4A–D).”

      In assessing the effects of dominant negative mutations, measurements including ROS levels, the ratio of Atg8-1 to Atg8-2, and the quantity of COXII protein were conducted, yet no significant differences were observed (Figure S6). This limitation of the fly model is mentioned in the results, noting the observation of the axonal degeneration phenotype but not alterations in ROS signaling, autophagy activity, or mitochondrial quantity as follows (line 287–290):

      “We investigated the impacts of dominant negative mutations on mitochondrial oxidation levels, mitochondrial quantity, and autophagy activation levels; however, none of these parameters showed statistical significance (Figure S6).”

      Despite these caveats, the authors provide the first animal model of DOA that also allows for rapid assessment and mechanistic testing of suspected OPA1 variants. The impact of this work in providing the first direct evidence of a dominant negative mechanism is under-stated considering how important this question is in development of genetic treatments for DOA. The authors discuss important points regarding the potential utility of this model in clinical science. Comments on the potential use of this model to investigate variants of unknown significance in clinical diagnosis requires further discussion of whether there is indeed precedent for this in other genetic conditions (since the model is nevertheless so evolutionarily removed from humans).

      As suggested by the reviewer, we have expanded the discussion in our study to emphasize in greater detail the significance of the fruit fly model and the MeDUsA software we have developed, elaborating on the model's potential applications in clinical science and its precedents in other genetic disorders. Our text is as follows (lines 299–318):

      “We have previously utilized MeDUsA to quantify axonal degeneration, applying this methodology extensively to various neurological disorders. The robust adaptability of this experimental system is demonstrated by its application in exploring a wide spectrum of genetic mutations associated with neurological conditions, highlighting its broad utility in neurogenetic research. We identified a novel de novo variant in Spliceosome Associated Factor 1, Recruiter of U4/U6.U5 Tri-SnRNP (SART1). The patient, born at 37 weeks with a birth weight of 2934g, exhibited significant developmental delays, including an inability to support head movement at 7 months, reliance on tube feeding, unresponsiveness to visual stimuli, and development of infantile spasms with hypsarrhythmia, as evidenced by EEG findings. Profound hearing loss and brain atrophy were confirmed through MRI imaging. To assess the functional impact of this novel human gene variant, we engineered transgenic Drosophila lines expressing both wild type and mutant SART1 under the control of a UAS promoter.

      Our MeDUsA analysis suggested that the variant may confer a gain-of-toxic-function (Nitta et al.,  2023). Moreover, we identified heterozygous loss-of-function mutations in DHX9 as potentially causative for a newly characterized neurodevelopmental disorder. We further investigated the pathogenic potential of a novel heterozygous de novo missense mutation in DHX9 in a patient presenting with short stature, intellectual disability, and myocardial compaction. Our findings indicated a loss of function in the G414R and R1052Q variants of DHX9 (Yamada et al., 2023). This experimental framework has been instrumental in elucidating the impact of gene mutations, enhancing our ability to diagnose how novel variants influence gene function.”

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall I enjoyed reading this paper. It is well presented and represents a significant amount of well executed study. I feel it further characterizes a poorly understood model of OPA1 variants and one which displays significant differences with the human phenotype. However I feel the use of this model with the author's experiments are not enough to validate this model/experiment as a screening tool for dominant negativity. I have therefore suggested the above experiments as a way to both further validate the mitochondrial dysfunction in this model and to ensure that the expressed transcript is able affect oligomerization as this is a pre-requisite to the authors conclusions.

      We assessed the extent to which our model reflects mitochondrial dysfunction using COXII, Atg8, and MitoSOX markers. Unfortunately, neither COXII levels nor the ratio of Atg8a-1 to Atg8a-2 showed significant variations across genotypes that would clarify the impact of dominant negative mutations. Nonetheless, MitoSOX and mito-QC results revealed that mitochondrial ROS levels and mitophagy are increased in Drosophila following intrinsic knockdown of dOPA1. These findings are documented in Figures 2, 5, and S6.

      Regarding oligomer formation, the specifics remain elusive in this study. However, the expression of dOPA1K273A, identified as a dominant negative variant in Drosophila, significantly disrupted retinal axon organization, as detailed in Figure S7. From these observations, we hypothesize that oligomerization of wild-type and dominant negative forms in Drosophila results in axonal degeneration. Conversely, co-expression of Drosophila wild-type with human dominant negative forms does not induce degeneration, suggesting that they likely do not interact.

      Reviewer #2 (Recommendations For The Authors):

      Materials and Methods:

      The authors used GMR-Gal4 to express OPA1-RNAi. I) GMR is expressed in most cells in the developing eye behind the morphogenetic furrow. So the defects observed can be due to knock- down in support cells rather than in photoreceptor cells.

      We have added the following sentences in the result (lines 194–196)."The GMR-Gal4 driver does not exclusively target Gal4 expression to photoreceptor cells. Consequently, the observed retinal axonal degeneration could potentially be secondary to abnormalities in support cells external to the photoreceptors.”

      OPA1-RNAi: how complete is the knock-down? Have the authors tested more than one RNAi line?

      We conducted experiments with an additional RNAi line, and similarly observed degeneration in the retinal axons (Figure S2 A and B; lines 178–179).

      The loss-of-function allele, induced by a P-element insertion, gives several eye phenotypes when heterozygous (Yarosh et al., 2008). Does RNAi expression lead to the same phenotypes?

      A previous report indicated that the compound eyes of homozygous mutations of dOPA1 displayed a glossy eye phenotype (Yarosh et al., 2008). Upon knocking down dOPA1 using the GMR-Gal4 driver, we also observed a glossy eye-like rough eye phenotype in the compound eyes. These findings have been added to Figure S3 and lines 192–194.

      There is no description on the way the somatic clones were generated. How were mutant cells in clones distinguished from wild-type cells (e. g. in Fig. 4).

      In the Methods section, we described the procedure for generating clones and their genotypes as follows (lines 502–505): "The dOPA1 clone analysis was performed by inducing flippase expression in the eyes using either ey-Gal4 with UAS-flp or ey3.5-flp, followed by recombination at the chromosomal location FRT42D to generate a mosaic of cells homozygous for dOPA1s3475." Furthermore, we have created a table detailing these genotypes. In these experiments, it was not possible to differentiate between the clone and WT cells. Accordingly, we have noted in the Results section (lines 201–203): "Note that the mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them.”

      Why were flies kept at 29{degree sign}C? this is rather unusual.

      Increased temperature was demonstrated to induce elevated expression of GAL4 (Kramer and Staveley, Genet. Mol. Res., 2003), which in turn led to an enhanced expression of the target genes. Therefore, experiments involving knockdown assays or Western blotting to detect human OPA1 protein were exclusively conducted at 29°C. However, all other experiments were performed at 25°C, as described in the methods sections: “Flies were maintained at 25°C on standard fly food. For knockdown experiments (Figures 1C–E, 1F–H, 2A–H, 3B–K, 5F, S1, S2 A and B, and S6A), flies were kept at 29°C in darkness.” Furthermore, “We regulated protein expression temporally across the whole body using the Tub-Gal4 and Tub-GAL80TS system. Flies harboring each hOPA1 variant were maintained at a permissive temperature of 20°C, and upon emergence, females were transferred to a restrictive temperature of 29°C for subsequent experiments.”

      Legends:

      It would be helpful to have a description of the genotypes of the flies used in the different experiments. This could also be included as a table.

      We have created a table detailing the genotypes. Additionally, in the legend, we have included a note to consult the supplementary table for genotypes.

      Results:

      Line 141: It is not clear what they mean by "degradation", is it axonal degeneration? And if so, what is the argument for this here?

      In the manuscript, we addressed the potential for mitochondrial degradation; however, recognizing that the expression was ambiguous, the following sentence has been omitted: "Nevertheless, the degradation resulting from mitochondrial fragmentation may have decreased the mitochondrial signal.”

      Fig. 2: Axons of which photoreceptors are shown?

      We have added "a set of the R7/8 retinal axons" to the legend of Figure 2.

      Line 167: The authors write that axonal degeneration is more severe after seven days than after eclosion. Is this effect light-dependent? The same question concerns the disappearance of the rhabdomere (Fig. 3G–J).

      We conducted the experiments in darkness, ensuring that the observed degeneration is not light- dependent. This condition has been added to the methods section to clarify the experimental conditions.

      Line 178/179: Based on what results do they conclude that there is degeneration of the "terminals" of the axons?

      Quantification via MeDUsA has enabled us to count the number of axonal terminals, and a noted decrease has led us to conclude axonal terminal degeneration. We have published two papers on these findings. We have added the following description to the results section to clarify how we defined degeneration (lines 174–176): "We have assessed the extent of their reduction from the total axonal terminal count, thereby determining the degree of axonal terminal degeneration (Richard JNS 2022; Nitta HMG 2023).

      Line 189: They write: ".. we observed dOPA1 mutant axons...". How did they distinguish es mutant from the controls?

      Fig. 5 and Fig. 6: How did they distinguish genetically mutant cells from genetically control cells in the somatic clones?

      Mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them. Accordingly, this point has been added to lines 201–203, “Note that the mutant clone analysis was conducted in a context where mutant and heterozygous cells coexist as a mosaic, and it was not possible to distinguish between them.” and the text in the results section has been modified as follows:

      (Before “To determine if dOPA1 is responsible for axon neurodegeneration, we observed the dOPA1 mutant axons by expressing full- length versions of dOPA1 in the photoreceptors at one day after eclosion and found that dOPA1 expression significantly rescued the axonal degeneration” —>

      (After “To determine if dOPA1 is responsible for axon neurodegeneration, we quantify the number of the axons in the dOPA1 eye clone fly with the expression of dOPA1 at one day after eclosion and found that dOPA1 expression partially rescued the axonal degeneration”

      Line 225/226: It is not clear to me how their approach "can quantitatively measure the degree of LOF".

      To address the reviewer's question and clarify how our approach quantitatively measures the degree of loss of function (LOF), we revised the statement (lines 238–247):

      "Our methodology distinctively facilitates the quantitative evaluation of LOF severity by comparing the rescue capabilities of various mutations. Notably, the 2708-2711del and I382M mutations demonstrated only partial rescue, indicative of a hypomorphic effect with residual activity. In contrast, the D438V and R445H mutations failed to show significant rescue, suggesting a more profound LOF. The correlation between the partial rescue by the 2708-2711del and I382M mutations and their classification as hypomorphic is significant. Moreover, the observed differences in rescue efficacy correspond to the clinical severities associated with these mutations, namely in DOA and DOA plus disorders. Thus, our results substantiate the model’s ability to quantitatively discriminate among mutations based on their impact on protein functionality, providing an insightful measure of LOF magnitude.”

      Discussion:

      Line 251, 252 and line 358: What is "the optic nerve" in the adult Drosophila?

      In humans, the axons of retinal ganglion cells (RGCs) are referred to as the optic nerve, and we posit that the retinal axons in flies are similar to this structure. In the introduction section, where it is described that the visual systems of flies and humans bear resemblance, we have appended the following definition (lines 107–108): “In this study, we defined the retinal axons of Drosophila as analogous to the human optic nerve.”

      Line 344: These bands appear only upon overexpression of the hOPA1 constructs, so this part of the is very speculative.

      Confirmation was achieved using anti-hOPA1, demonstrating that myc is not nonspecific. These results have been added to Figure 5D. Furthermore, the phrase “The upper band was expected as” has been revised to “From a size perspective, the upper band was inferred to represent the full-length hOPA1 including the mitochondria import sequence (MIS).” (lines 464–465)

      I was missing a discussion about the increase of ROS upon loss/reduction of dOPA1 observed by others and described here. Is there an increase of ROS upon expression of any of the constructs used?

      We demonstrated that not only axonal degeneration but also ROS can be suppressed by expressing human OPA1 in the genetic background of dOPA1 knockdown. Additionally, rescue was not possible with any variants except for I382M. Furthermore, we assessed whether there were changes in ROS in the evaluation of dominant negatives, but no significant differences were observed in this experimental system. These findings have been added to the discussion section as follows (lines 318–328). “Our research established that dOPA1 knockdown precipitates axonal degeneration and elevates ROS signals in retinal axons. Expression of human OPA1 within this context effectively mitigated both phenomena; it partially reversed axonal degeneration and nearly completely normalized ROS levels. These results imply that factors other than increased ROS may drive the axonal degeneration observed post-knockdown. Furthermore, while differences between the impacts of DN mutations and loss-of- function mutations were evident in axonal degeneration, they were less apparent when using ROS as a biomarker. The extensive use of transgenes in our experiments might have mitigated the knockdown effects. In a systemic dOPA1 knockdown, assessments of mitochondrial quantity and autophagy activity revealed no significant changes, suggesting that the cellular consequences of reduced OPA1 expression might vary across different cell types.”

      Reviewer #3 (Recommendations For The Authors):

      Consider being more explicit regarding literature that has or has failed to test a direct dominant negative effect by expressing a variant in question in the background of a full OPA1 complement. My understanding is that this is the first direct evidence of this widely held hypothesis. This lends to the main claim promoting the utility of fly as a model in general. The authors might also outline this in the introduction as a knowledge gap they fill through this study.

      In the introduction, we have incorporated a passage that highlights precedents capable of distinguishing between LOF and DN effects, and we note the absence of models capable of dissecting these distinctions within an in vivo organism. This study aims to address this gap, proposing a model that elucidates the differential impacts of LOF and DN within the context of a living model organism, thereby contributing to a deeper understanding of their roles in disease pathology. We added the following sentences in the introduction (lines 71–80).

      “In the quest to differentiate between LOF and DN effects within the context of genetic mutations, precedents exist in simpler systems such as yeast and human fibroblasts. These models have provided valuable insights into the conserved functions of OPA1 across species, as evidenced by studies in yeast models (Del Dotto et al., 2018) and fibroblasts derived from patients harboring OPA1 mutations (Kane et al., 2017). However, the ability to distinguish between LOF and DN effects in an in vivo model organism, particularly at the structural level of retinal axon degeneration, has remained elusive. This gap underscores the necessity for a more complex model that not only facilitates molecular analysis but also enables the examination of structural changes in axons and mitochondria, akin to those observed in the actual disease state.”

      The authors should clarify the language used in the abstract and introduction on the effect of hOPA1 DOA and DOA plus on the dOPA1- phenotype. Currently written as "none of the previously reports mutations known to cause DOA or DOA plus were rescued, their functions seems to be impaired." but presumably the authors mean that these variants failed to rescue to the dOPA1 deficient phenotype.

      We thank the reviewer for the constructive feedback. We acknowledge the need for clarity in our description of the effects of hOPA1 DOA and DOA plus mutations on the dOPA1- phenotype in both the abstract and the introduction. The current phrasing, "none of the previously reported mutations known to cause DOA or DOA plus were rescued, their functions seem to be impaired," may indeed be confusing. To address your concern, we have revised this statement to more accurately reflect our findings: "Previously reported mutations failed to rescue the dOPA1 deficiency phenotype." For Abstract site, we have changed as following. "we could not rescue any previously reported mutations known to cause either DOA or DOA plus.”→ “mutations previously identified did not ameliorate the dOPA1 deficiency phenotype.”

      DOA plus is associated with a multiple sclerosis-like illness; as written it suggests that the pathogenesis of sporadic multiple sclerosis and that associated with DOA plus share and underlying pathogenic mechanism. Please use the qualifier "-like illness." 

      We have added the term “multiple sclerosis-like illness” wherever “multiple sclerosis” is mentioned.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The authors want to elucidate which are the mechanisms that regulate the immune response in physiological conditions in cortical development. To achieve this goal, authors used a wide range of mutant mice to analyse the consequences of immune activation in the formation of cortical ectopia in mice.

      Strengths:

      The authors demonstrated that Abeta monomers are anti-inflammatory and inhibit microglial activation. This is a novel result that demonstrates the physiological role of APP in cortical development.

      Weaknesses:

      -On the other hand, cortical ectopia has been already described in mouse models in which the amyloid signalling has been disrupted (Herms et al., 2004; Guenette et al., 2006), making the current study less novel.

      We agree these previous studies have implicated amyloid precursor protein in cortical ectopia. However, since these studies use whole-body knockouts, they have not implicated the functional roles of specific cell types.  Nor have they identified the specific mechanisms underlying the formation of this unique class of cortical ectopia. In contrast, our studies show that the disruption of a novel Abeta-regulated signaling pathway in microglia is the primary cause of ectopia formation in this class of ectopia mutants. This is the first time that microglia have been specifically implicated in the development of cortical ectopia. We further show that elevated MMP activity and resulting cortical basement membrane degradation is the underlying mechanism leading to ectopia formation.  This is also the first time that MMP activity and basement membrane degradation (instead of maintenance) have been implicated in cortical ectopia development. As such, our results have provided novel insights into the diverse mechanisms underlying cortical ectopia formation in developmental brain disorders.

      One of the molecules analysed is Ric8a, a GTPase activator involved in neuronal development. Authors used the conditional mutant mice Emx1-Ric8a to delete Ric8a from early progenitors and glutamatergic neurons in the pallium. Emx1-Ric8a mutant mice present cortical ectopias and authors attributed this malformation to the increase in inflammatory response due to Ric8a deletion in microglia. Several discordances do not fit this interpretation:

      -The role of Ric8a in cortical development and function has been already described in several papers, but none of them has been cited in the current manuscript (Kask et al., 2015, 2018; Ruisu et al., 2013; Tonissoo et al., 2006).

      We will include reference to these publications in revision.

      -Ectopia formation in the cortex has been already described in Nestin-Ric8a cKO mice (Kask et al., 2015). In the current manuscript, authors analyzed the same mutant mice (Nestin-Ric8a), but they did not detect any ectopia. Authors should discuss this discordance.

      The expression pattern of nestin-cre is known to vary dependent on factors including transgene insertion site, genetic background, and sex. Early studies show, for example, that the nestin gene promoter drives cre expression in many non-neural tissues in another transgenic line in the FVB/N genetic background (Dubois et al Genesis. 2006 Aug;44(8):355-60. doi: 10.1002/dvg.20226).  The specific nestin-cre line used in Kask et al 2015 has also been shown to be active in brain microglia and lead to increased microglia pro-inflammatory activity upon breeding to a conditional allele of a cholesterol transporter gene (Karasinska et al., Neurobiol Dis. 2013 Jun:54:445-55; Karasinska et al.,  J Neurosci. 2009 Mar 18; 29(11): 3579–3589). The ectopia reported in Kask et al 2015 are also significantly more subtle than what we have observed and apparently not observed in all mutant animals (we observe severe ectopia in every single emx1-cre mutant).  We presume the ectopia reported in Kask et al 2015 may result from a combined deletion of ric8a gene from microglia and neural cells due to unique combinations of factors affecting nestin-cre expression in a subset of mutants.

      -Authors claim that microglia express Emx1, and therefore, Ric8a is deleted in microglia cells. However, the arguments for this assumption are very weak and the evidence suggests that this is not the case. This is an important point considering that authors want to emphasise the role of Ric8a in microglia activation, and therefore, additional experiments should demonstrate that Ric8a is deleted in microglia in Emx1-Ric8a mutant mice.

      We have observed altered mRNA expression of several genes in purified microglia cultured from the emx1-cre mutants (Supplemental Fig. 8), which indicates that ric8a is deleted from microglia and suggests a role of microglial ric8a deficiency in ectopia formation.  This interpretation is further strengthened by the observation that deletion of ric8a from microglia using a microglia-specific cx3cr1-cre results in similar ectopia (Fig. 2). We also have other data supporting this interpretation, including data showing induction of the expression of a cre reporter in brain microglia by emx1-cre and loss of ric8a gene expression in microglia cells isolated from emx1-cre mutants. We will include these data in revision.

      Reviewer #2 (Public Review):

      Kwon et al. used several conditional KO mice for the deletion of ric8a or app in different cell types. Some of them exhibited pial basement membrane breaches leading to neuronal ectopia in the neocortex.

      They first investigated ric8a, a Guanine Nucleotide Exchange Factor for Heterotrimeric G Proteins. They observed the above-mentioned phenotype when ric8a is deleted from microglia and neural cells (ric8a-emx1-cre or dual deletion with cre combination cx3cr1 (in microglia) and nestin (in neural cells)) but not in microglia alone or neural cells alone (whether it is in CR cells (ric8a-Wnt3a-cre), post-mitotic neurons (nex-cre or dlx5/6-cre), or in progenitors and their progeny (nestin-cre or foxg1-cre). They also show that ric8a KO mutant microglia cells stimulated in vitro by LPS exhibit an increased TNFa, IL6 and IL1b secretion compared to controls (Fig 2). They therefore injected LPS in vivo and observed the neuronal ectopia phenotype in the ric8a-cx3cr1-cre (microglial deletion) cortices at P0 (Fig 2). They suggest that ric8a KO in neuronal cells mimics immune stimulation (but we have no clue how ric8a KO in neural cells would induce immune stimulation).

      We agree we do not currently know the precise mechanisms by which mutant microglia are activated in the mutant brain.  However, this does not affect the conclusion that deficiency in the Abeta monomer-regulated APP/Ric8a pathway in microglia is the primary cause of cortical ectopia in these mutants, since we have shown that genetic disruption of this pathway in microglia alone by different means targeting different pathway components, using cell type specific cre, all results in similar cortical ectopia phenotypes.  Regarding the source of the immunogens, there are several possibilities which we plan to investigate in future studies. For example, the clearance of apoptotic cells and associated cellular debris is an important physiological process and deficits in this process have been linked to inflammatory diseases throughout life (Doran et al., Nat Rev Immunol. 2020 Apr;20(4):254-267; Boada-Romero et al., Nat Rev Mol Cell Biol. 2020 Jul;21(7):398-414.).  In the embryonic cortex, studies have shown that large numbers of cell death take place starting as early as E12 (Blaschke et al., Development. 1996 Apr;122(4):1165-74; Blaschke et al., J Comp Neurol. 1998 Jun 22;396(1):39-50).  Studies have also shown that radial glia and neuronal progenitors play critical roles in the clearance of apoptotic cells and associated cellular debris in the brain (Lu et al., Nat Cell Biol. 2011 Jul 31;13(9):1076-83; Ginisty et al., Stem Cells. 2015 Feb;33(2):515-25; Amaya et al., J Comp Neurol. 2015 Feb 1;523(2):183-96). Moreover, Ric8a-dependent heterotrimeric G proteins have been found to specifically promote the phagocytic activity of both professional and non-professional phagocytic cells (Billings et al., Sci Signal. 2016 Feb 2;9(413):ra14; Preissler et al., Glia. 2015 Feb;63(2):206-15; Pan et al. Dev Cell. 2016 Feb 22;36(4):428-39; Flak et al. J Clin Invest. 2020 Jan 2;130(1):359-373; Zhang et al., Nat Commun. 2023 Sep 14;14(1):5706).  Thus, it is likely that the failure to promptly clear up apoptotic cells and debris by radial glia may play a role in the triggering of microglial activation in ric8a mutants. We have not included discussion of these possibilities since the precise mechanisms remain to be determined.  Moreover, they also do not impact the conclusion of the current study.

      The authors then turned their attention on APP. They observed neuronal ectopia into the marginal zone when APP is deleted in microglia (app-cxcr3-cre) + intraperitoneal LPS injection (they did not show it, but we have to assume there would not be a phenotype without the injection of LPS) (Fig 3). (The phenotype is similar but not identical to ric8a-cx3cr1-cre + LPS. They suggest that the reason is because they had to inject 3 times less LPS due to enhanced immune sensitivity in this genetic background but it is only a hypothesis). After in vitro stimulation by LPS, app mutant microglia show a reduced secretion of TNFa and IL6 but not IL1b (this is the opposite to ric8a-cx3cr1-cre microglia cells) while peritoneal macrophages in culture show increased secretion of TNFa, IL1, IL6 and IL23 (fig 3 and Suppl. Fig 9).

      We have data showing that that app-cxcr3-cre mutants without LPS injection do not show ectopia and will include them in revision.  The reason we employ LPS injection is, in the first place, we do not see a phenotype without the injection. We agree, and have also stated in the text, that the phenotype of the app mutants is not as severe as that of the ric8a mutant.  Besides the low LPS dosage used, we also suggest that other app family members may compensate since the ectopia in the app family gene mutants reported previously were only observed in app/aplp1/2 triple knockouts, not even in any of the double knockouts (Herms et al., 2004). These potential causes are also not mutually exclusive. Nonetheless, the microglia specific app mutants clearly show ectopia upon immune stimulation, implicating a role of microglial APP in cortical ectopia formation.

      The distinct response of ric8a and app microglia to LPS results from in vitro culturing of microglia. Indeed, we have shown that, when acutely isolated macrophages are used, these mutants show changes in the same direction (both increased cytokine secretion).  The microglia used for analysis in this study have all been cultured in vitro for two weeks before assay. They have thus been under chronic stimulation exposing to dead cells and debris in the culture dish through this period.  Dependent on the degree of perturbation to inflammation-regulating pathways, such exposures are known to significantly change microglial cytokine expression, sometimes in an opposite direction from expected.  For example, under chronic immune stimulation, while the trem2+/- microglia, which are heterozygous mutant for the anti-inflammatory Trem2, show elevated pro-inflammatory cytokine expression as expected, trem2-/- (null) microglia under the same conditions instead not only do not show increases but for some pro-inflammatory cytokines, actually show decreases in expression (Sayed et al.,, Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):10172-10177).  In several systems, Ric8a-dependent heterotrimeric G proteins have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  It is likely that in microglia Ric8a-dependent heterotrimeric G proteins may also mediate only a subset of the signaling downstream of APP.  As such, app knockout in microglia may have more severe effects than ric8a knockout on microglial immune activation and lead to changes in the opposite direction compared to ric8a knockout, as has been observed for trem2 null mutation vs heterozygosity discussed above. This may explain the subdued TNF and IL6 secretion by cultured app mutant microglia.

      Amyloid beta (Ab) being one of the molecules binding to APP, the authors showed that Ab40 monomers (they did not test Ab40 oligomers) partially inhibit cytokines (TNFa, IL6, IL1b, MCP-1, IL23a, IL10) secretion in vitro by microglia stimulated by LPS but does not affect secretion by microglia from app-cx3cr1-cre (tested for TNFa, IL6, IL1b, IL23a, IL10) (Fig 4, Suppl fig 10) (but still does it in aplp2-cx3cr1-cre) and does not affect secretion by ric8a-cx3cr1-cre microglia (tested for TNFa and IL6 but still suppress IL1b) (Therefore here is another difference between app and ric8a KO microglia).

      We have tested the effects of Abeta40 oligomers, which induce instead of suppressing microglial cytokine secretion, and will include the data in revision.  As mentioned above, in several systems, Ric8a-dependent heterotrimeric G proteins have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  We assume that this is likely also true in microglia and that Ric8a-dependent heterotrimeric G proteins may mediate only a subset of the signaling downstream of APP.  This may explain the difference in the effects of APP and ric8a knockout mutation in abolishing the anti-inflammatory effects of Abeta monomers on IL-1b vs TNF/IL-6.  It also suggests that TNF/IL-6 and IL-1b secretion must be regulated by different mechanisms. Indeed, it is well established in immunology that the secretion of IL1b, but not of TNF or IL6, is regulated by inflammasome-dependent mechanisms (see, for example, Proz & Dixit. Nat Rev Immunol. 2016 Jul;16(7):407-20. doi: 10.1038/nri.2016.58).

      The authors injected inhibitors of Akt or Stat3 in the ric8a-emx1-cre cortex and found it suppressed neuronal ectopia (Fig 5, Suppl fig 11). It is not clear whether it suppresses immune stimulation from neuronal cells or immune reaction from microglia cells.

      We agree at present the pharmacological approaches we have taken are not able to distinguish these possibilities.  However, whichever of these possibilities turns out to be the case would still implicate a role of excessive microglial activation in the formation of cortical ectopia and support the conclusion of the study.  Thus, while potentially worthwhile of further investigation, this question does not impact the conclusion of this study. Furthermore, as mentioned, we plan to determine the mechanisms of how ric8a mutation in neural cells induces immune activation in future studies. These results will likely enable us to adopt more specific approaches to address this question.

      Finally, the authors examined the activities of MMP2 and MMP9 in the developing cortex using gelatin gel zymography. The activity and protein levels of MMP9 but not MMP2 in the ric8a-emx1-cre cortex were claimed significantly increased (Fig 5, Suppl fig 12). Unfortunately, they did not show it in the app-cx3cr1-cre +LPS mouse. They make a connection between ric8a deletion and MMP9 but unfortunately do not make the connection between app deletion and MMP9, which is at the center of the pathway claimed to be important here). Then they injected BB94, a broad-spectrum inhibitor of MMPs or an inhibitor specific for MMP9 and 13. They both significantly suppress the number and the size of the ectopia in ric8a mutants (Fig5).

      For all the gelatin gel zymography analysis, we quantify protein concentrations in the cortical lysates using the Bio-Rad Bradford assay kit and load the same amounts of proteins per lane. The results across lanes are thus directly comparable. From the quantification, our results clearly show that MMP9, but not MMP2, levels are increased in the mutants (supplemental Figure 12).  The data on MMP2 also provide an internal control further supporting the observation of a specific change in MMP9.  For this analysis, we focus on the ric8a-emx1-cre mutants since the app-cx3cr1-cre +LPS animals show less severe, more localized ectopia and in most cases only in one of the hemispheres.  Any changes in MMP9 are therefore likely to be masked and the experiments unlikely to yield meaningful results.  On the other hand, we have clearly shown that the administration of different classes of MMP inhibitors significantly eliminate ectopia in ric8a-emx1-cre mutants. This has strongly implicated a functional contribution of MMPs.

      After reading the manuscript, I still do not know how ric8a in neural cells is involved in the immune inhibition. Is it through the control of Ab monomers? In addition, the authors did not show in vivo data supporting that Ab monomers are the key players here. As the authors said, this is not the only APP interactor. Finally, I still do not know how ric8a is linked to APP in microglia in the model.

      As detailed above, there are several possibilities including potential deficits in the clearance of apoptotic cells and associated debris that may trigger microglial activation in ri8ca-emx1-cre mutants. We will investigate these possibilities in future studies.  We have not included discussion since their roles remain to be determined.  As for the role of Abeta monomers, we have indicated that we currently do not have evidence that in the developing cortex Abeta monomers play a role in inhibiting microglia.  We have also indicated in the manuscript that our conclusion is that an Abeta monomer-activated microglial pathway regulates normal brain development, not that Abeta monomers themselves regulate brain development.  Regarding the link between Ric8a and APP, the reviewer has missed several major lines of supporting evidence. For example, we have shown that Abeta monomers activates a pathway in microglia that inhibits the secretion of several proinflammatory cytokines including TNF, IL-6, IL-10, and IL-23 (Figure 4 and Supplemental Figures 8-10).  This inhibition is abolished when either app or ric8a gene is deleted from microglia.  This indicates that app and ric8a act in the same pathway activated by Abeta monomers in microglia. We also show that this Abeta monomer-activated pathway also inhibits the transcription of several cytokines in microglia.  This inhibition is also abolished when either app or ric8a gene is deleted from microglia.  This reinforces the conclusion that app and ric8a act in the same pathway in microglia.  Furthermore, cell type specific deletion of app or ric8a from microglia in vivo also results in similar phenotypes of cortical ectopia. Together, these results thus strongly support the conclusion that app and ric8a act in the same pathway activated by Abeta monomers in microglia. This conclusion is also consistent with published findings that Ric8a dependent heterotrimeric G proteins bind to APP and mediate subsets of APP signaling across different different species (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).         

      While several of the findings presented in this manuscript are of potential interest, there are a number of shortcomings. Here are some suggestions that could improve the manuscript and help substantiate the conclusions:

      (1) As the title suggests it, the focus is on Ab and APP functions in microglia. However, the analysis is more focused on ric8a. The connection between ric8a and APP in this study is not investigated, besides the fact that their deletion induces somewhat similar but not identical phenotypes. Showing a similar phenotype is not enough to conclude that they are working on the same pathway. The authors should find a way to make that connection between ric8a and app in the cells investigated here.

      As discussed above, the reviewer misses several major lines of evidence showing that APP and Ric8a acts in the same pathway in microglia.  For example, besides the similarity of the ectopia phenotypes, we have shown that Abeta monomers activates a pathway in microglia that inhibits the secretion of several proinflammatory cytokines including TNF, IL-6, IL-10, and IL-23 (Figure 4 and Supplemental Figures 8-10).  These inhibitory effects are completely abolished when either app or ric8a gene is deleted from microglia.  This indicates that app and ric8a act in the same pathway activated by Abeta monomers in microglia. We also show that this Abeta monomer-activated pathway inhibits the transcription of several cytokine genes in microglia.  These effects are again completely abolished when either app or ric8a gene is deleted from microglia.  This further reinforces the conclusion that app and ric8a act in the same pathway in microglia.  Not only so we also show that the same results are true in macrophages.  Together, these results therefore strongly support the conclusion that app and ric8a act in the same pathway in microglia. This conclusion is also consistent with published findings that Ric8a dependent heterotrimeric G proteins bind to APP and mediate APP signaling across different species (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).

      (2) This would help to show the appearance of breaches in the pial basement membrane leading to neuronal ectopia; to investigate laminin debris, cell identity, Wnt pathway for app-cxcr3-cre + LPS injection as you did for ric8a-emx1-cre.

      We will provide further data on the breaches in the pial basement membrane.  We have not observed any changes in cell identity or Wnt pathway activity in ric8a-emx1-cre mutants. The ectopia phenotype in the app-cxcr3-cre + LPS animals is also less severe.  It is therefore likely of limited value to examine potential changes in these areas.

      (3) As a control, this would help to show that app-cxcr3-cre without the LPS injection does not display the phenotype.

      We have the data on app-cx3cr1-cre mutants without LPS injection, which show no ectopia, and will include the data in revision.

      (4) This would help to show the activity and protein levels of MMP9 and MMP2 and perform the rescue experiments with the inhibitors in the app-cx3cr1-cre cortex +LPS.

      As discussed above, we focus analysis on the ric8a-emx1-cre mutants since app-cx3cr1-cre +LPS animals show less severe, more localized ectopia and in most cases only in one of the hemispheres.  Determining potential changes in MMP9 levels and effects of MMP inhibitors are therefore not likely to yield useful data.  On the other hand, we have shown that MMP9 levels are increased and administration of different classes of MMP inhibitors eliminate cortical ectopia in ric8a-emx1-cre mutants.  This has strongly implicated a functional contribution of MMPs.

      (5) Is MMP9 secreted by microglia cells or neural cells?

      Our in situ hybridization data show MMP9 is most highly expressed in macrophage-like cells in the embryonic cortex, suggesting that microglia may be a major source of MMP9. We will incorporate these data in revision.

      (6) The in vitro evidence indicates that one of the multiple APP interactors, ie Ab40 monomers, is less effective in suppressing the expression of some cytokines by microglia cells mutants for ric8a (TNFa and IL6 but still suppress IL1b) or APP (TNFa, IL6, IL1b, IL23a, IL10) when compared to WT. But there are other interactors for APP. In order to support the claim, it seems crucial to have in vivo data to show that Ab40 monomers are the molecules involved in preventing the breach in the pial basement membrane.

      As addressed in detail above, we have indicated that our conclusion is that an Abeta monomer-activated microglial pathway regulates normal brain development, not that Abeta monomers themselves regulate brain development.  We currently do not have evidence that the Abeta monomers play a role in inhibiting microglia in the developing cortex.  There are candidate ligands for the pathway in the developing cortex, the functional study of which, however, is a major undertaking and beyond the scope of the current study.

      (7) In order to claim that this is specific to Ab40 monomers and not oligomers, it is necessary to show that the Ab40 oligomers do not have the same effect in vitro and in vivo. Also, an assay should be done to show that your Ab preparations are pure monomers or oligomers.

      We have tested the effects of Abeta40 oligomers, which induce instead of suppressing microglial cytokine secretion, and will include the data in revision. The protocols we use in preparing the monomers and oligomers are standard protocols employed in the field of Alzheimer’s disease research and have been optimized and validated repeatedly over the past several decades.  

      (8) Most of the cytokine secretion assays used microglia cells in culture. Two results draw my attention. Ric8a deletion increases TNFa and IL6 secretion after LPS stimulation in vitro on microglia cells while app deletion decreases their secretion. Then later, papers show that the decrease in IL1b induced by Ab on microglia cells is prevented by APP deletion but not ric8a deletion. Those two pieces of data suggest that ric8a and APP might not be in the same pathway. In addition, the phenotype from app-cxcr3-cre + LPS injection and ric8a-cxcr3-cre + LPS injection are not exactly the same. It could be due to the level of LPS as the author suggests or it might not be. More experiments are needed to prove they are in the same pathway.

      As discussed above, the reviewer misses several major lines of evidence, which strongly support the conclusion that APP and Ric8a act in the same pathway activated by Abeta monomers in microglia (see detailed discussion in point 1).  The differential response of app and ric8a mutant microglia likely results from chronic immune stimulation during in vitro culturing, which is known to alter microglia cytokine expression (see detailed discussion in point 9 below on how chronic immune stimulation changes microglial cytokine expression). We have demonstrated this by showing that, without culturing, acutely isolated app and ric8a mutant macrophages both display elevated cytokine secretion (Figure 4).  Regarding the distinct regulation of TNF/IL-6 and IL-1b by APP and Ric8a, as discussed above, in several systems, Ric8a-dependent heterotrimeric G proteins have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  It is likely this is also the case in microglia and Ric8a-dependent heterotrimeric G proteins may mediate only a subset of the anti-inflammatory signaling activated by APP.  As such, this may explain why app, but ric8a, mutation abolishes the inhibitory effects of Abeta monomers on IL-1b.  This also suggests that the secretion of TNF/IL-6 and IL-1b must be regulated by different mechanisms. Indeed, it is well established in immunology that the secretion of IL1b, but not that of TNF or IL6, is regulated by inflammasome-dependent mechanisms (see, for example, Proz & Dixit. Nat Rev Immunol. 2016 Jul;16(7):407-20. doi: 10.1038/nri.2016.58).

      (9) How do the authors reconcile the reduced TNFa and IL6 secretion upon stimulation of app mutant microglia with the model where app is attenuating immune response in vivo? Line 213 says that microglia exhibit attenuated immune response following chronic stimulation but I don't know if 3 hours of LPS in vitro is a chronic stimulation.

      The reviewer has misunderstood.  The microglia used in this study have all been cultured in vitro for approximately two weeks before assay. They have thus been under chronic stimulation exposing to dead cells and debris in the culture dish throughout this period.  Dependent on the degree of perturbation to inflammation-regulating pathways, such exposures are known to significantly change microglial cytokine expression, sometimes in an opposite direction than expected.  For example, under chronic immune stimulation, while the trem2+/- microglia, which are heterozygous mutant for the anti-inflammatory Trem2, show elevated pro-inflammatory cytokine expression as expected, trem2-/- (null) microglia under the same conditions instead not only do not show increases but for some pro-inflammatory cytokines, actually show decreases in expression (Sayed et al.,, Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):10172-10177).  As mentioned, in several systems, Ric8a-dependent heterotrimeric G proteins have also been shown to bind to APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  It is likely that Ric8a-dependent heterotrimeric G proteins also mediate only a subset of the anti-inflammatory signaling activated by APP in microglia.  As such, app knockout in microglia may have more severe effects than ric8a knockout on microglial immune activation, similar to the relationship between trem2 null mutation vs heterozygosity discussed above. This likely explains why TNF and IL6 secretion by cultured app mutant microglia is subdued.  In contrast, we find that acutely isolated app mutant macrophages show increased cytokine secretion. This is likely more representative of the response of app mutant microglia in the absence of chronic stimulation.

      (10) Line 119: In their model, the authors suggest that there is a breach in pial basement membrane but that the phenotype is different from the retraction of the radial fibers due to reduced adhesion. So, could the author discuss to what substrate the radial fibers are attached to, in their model where the pial surface is destroyed?

      Radial glial endfeet normally bind to the basement membrane via cell surface receptors including the integrin and the dystroglycan protein complexes. We observe free radial glial endfeet at the breach sites, apparently without attachment to any basement membrane.  However, we cannot exclude the possibility that there may be residual basement components not detected by the methodology employed. 

      (11) The authors should show that the increased cytokine secretion observed in vitro is also happening in vivo in ric8a-emx1-cre compared to WT mice and compared to ric8a-nestin-cre mice. Or when app is deleted in microglia (app-cxcr3-cre) + LPS injection compared to WT mice +LPS.

      Unfortunately, this is not technically feasible since it is impossible to extract the extracellular (secreted) fractions of cytokines from an embryonic brain without causing cell lysis and the release of the intracellular pool.  This, however, does not affect our conclusion that the Abeta monomer-regulated microglia pathway plays a key role in regulates normal brain development since its genetic disruption, by different approaches, clearly results in brain malformation.

      (12) The authors injected inhibitors of Akt or Stat3 in the ric8a-emx1-cre cortex and found that it suppressed neuronal ectopia (Fig 5, Suppl fig 11). Does it suppress immune stimulation from neuronal cells or immune reaction from microglia cells?

      As discussed above, we agree at present the pharmacological approaches we have taken are not able to distinguish these two possibilities.  However, no matter which possibility is true, it does not affect our conclusion.  Furthermore, we also plan to determine the mechanisms of how ric8a mutation in neural cells induce immune activation in future studies. These results will likely enable us to adopt specific approaches to address this question.

      (13) Fig 5 and Supplementary fig 12: Please show a tubulin loading control in Fig 5i as you did in suppl fig 12 d (gel zymography). Please provide a gel zymography showing side by side Control, mutant and mutant +DM/S3I treatment. The same request for the MMP9 staining. Please provide statistics for control vs mutant for suppl fig 12c and d.

      For all experiments of the gelatin gel zymography analysis, we quantify protein concentrations in the cortical lysates using the Bio-Rad Bradford assay kit and load the same amounts of proteins per lane. The results across lanes are thus all comparable.  These experiments were also performed several years ago before the pandemic and we unfortunately no longer have the samples.  We will, however, provide the protein quantification information in revision.  The MMP9 staining images for the controls and mutants have also all been taken with the same parameters on the microscope and can be directly compared.  The statistics will be provided as suggested.

      (14) Please provide the name and the source of the MMP9/13 inhibitor used in this study.

      This inhibitor is MMP-9/MMP-13 inhibitor I (CAS 204140-01-2), from Santa Cruz Biotechnology. This information will be included in revision.

      (15) The results show that deletion of ric8a in microglia and neural cells induced pia membrane breaches but no phenotype is apparent in ric8a deletion in microglia or neural cells alone. Then, the results showed that intraperitoneal injection of LPS induced the phenotype in ric8a-cxcr3-cre mutants. It would be beneficial as a control supporting the model to show that the insult induced by LPS injection does not induce the phenotype in the ric8a-foxg1-cre mice.

      We agree it may potentially be useful to show that LPS injection does not induce ectopia in ric8a-foxg1-cre mice.  Unfortunately, since the ric8a-foxg1-cre mutation shows no phenotype, we are no longer in possession of this line.

      Reviewer #1 (Recommendations For The Authors):

      -The information in the abstract and the introduction is only related to app. So, it is very abrupt how authors start the manuscript studying the role of Ric8a, with no information at all about this protein and why the authors want to investigate this role in microglial activation. Later in the manuscript, the authors tried to link Ric8a with app to study the role of app in the inflammatory response and ectopia formation. This link is quite weak as well.

      In the last paragraph of the Introduction, we explain the use of the ric8a mutant and how it leads to discovery of the Abeta monomer-regulated pathway. We will improve the writing in revision to make these points clearer.  We will also improve the writing of the potential link of Ric8a to APP by highlighting, especially, the fact that ric8a and app pathway mutants are among a unique group of only three mouse mutants (ric8a, app/aplp1/2, and apbb1/2) that show cortical ectopia exclusively in the lateral cortex, while all other cortical ectopia mutants show the most severe ectopia are at the midline.

      -In order to validate the mouse model, double immunofluorescence or immunofluorescence+in situ hybridization should be performed to show that microglia express ric8a and that is eliminated in the Emx1-Ric8a mutant mice.

      As mentioned above, we have additional lines of evidence showing that ric8a is deleted from microglia in emx1-cre mutants. This includes data showing induction of the expression of a cre reporter in brain microglia by emx1-cre and loss of ric8a gene expression in microglia cells isolated from emx1-cre mutants.  We will include these data in revision.

      -In Supplemental Fig. 6, the authors claimed that cell proliferation is normal in Ric8a mutant mice without doing any quantification. They also quantified the angle of mitotic division of progenitors in the ventricular zone, but there are no images for the spindle orientation quantification, and no description of how they did it. In addition, this data is contrary to what has already been published in conditional Ric8a mutant mice (Kask et al., 2015). The Vimentin staining should be improved.

      We will provide quantification of cell proliferation in revision. We will also provide details on the quantification on mitotic spindle orientation.  We are not sure why the results are different from the other study. We were indeed anticipating deficits in mitotic spindle orientation and spent major efforts in the analysis.  However, based on the data, we could not draw the conclusion.

      -Analysis of the MMP9 expression should be done by western blot and not by immunofluorescence. In fact, the MMP9 expression shown in Figure 5g,h, does not correspond with RNA expression shown in gene expression atlas like genepaint or the allen atlas, doubting the specificity of the antibody. The expression of Mmp9 is quite low or absent in the cortex at E13.5-E14.5, making this protein very unlikely to be responsible for laminin degradation during development.

      We perform gelatin gel zymography on MMP2/9, which shows increased MMP9 activity levels in the mutant cortex. This is similar to Western blot analysis (all lanes are loaded with the same amounts of cortical lysates).  The immunofluorescence staining, a different type, of analysis, was designed as a complementary approach.  Regarding RNA expression, please also note that MMP9 is a secreted protein and the protein expression pattern is expected to be different from that of RNA. We also have in situ data showing that, while MMP9 mRNA is indeed low, it is strongly expressed in macrophage-like cells most prominently in cortical blood vessels at E12-E13 (we will include these data in revision).  We suspect that these cells are microglial lineage cells populating the embryonic cortex at this stage (see, for example, Squarzoni et al., Cell Rep. 2014 Sep 11;8(5):1271-9. doi: 10.1016/j.celrep.2014.07.042.) and may be a major source of cortical MMP9.  As for functional contributions, we agree that we cannot rule roles played by other MMPs.  However, based on the ectopia suppression data, our results clearly indicate a key functional contribution by MMP9/13.

      For MMP9 activity, authors should show the whole membrane with a minimum of three control and three mutant individual samples and with the quantification.<br /> -The graphs should be improved, including individual values and titles of the Y axes.

      We will include these data in revision (the quantification of MMP9 activity is provided in Supplemental Figure 12d) and improve the graphs as suggested.

    1. Author response:

      We thank the reviewers for their feedback and will work to address it in our revision. We appreciate their recognition of the value of the dataset and will continue to strive to make it useful to the community.

    1. Author response:

      Puvlic Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      Dr. Santamaria's group previously utilized antigen-specific nanomedicines to induce immune tolerance in treating autoimmune diseases. The success of this therapeutic strategy has been linked to expanded regulatory mechanisms, particularly the role of T-regulatory type-1 (TR1) cells. However, the differentiation program of TR1 cells remained largely unclear. Previous work from the authors suggested that TR1 cells originate from T follicular helper (TFH) cells. In the current study, the authors aimed to investigate the epigenetic mechanisms underlying the transdifferentiation of TFH cells into IL-10-producing TR1 cells. Specifically, they sought to determine whether this process involves extensive chromatin remodeling or is driven by preexisting epigenetic modifications. Their goal was to understand the transcriptional and epigenetic changes facilitating this transition and to explore the potential therapeutic implications of manipulating this pathway. 

      The authors successfully demonstrated that the TFH-to-TR1 transdifferentiation process is driven by pre-existing epigenetic modifications rather than extensive new chromatin remodeling. The comprehensive transcriptional and epigenetic analyses provide robust evidence supporting their conclusions. 

      Strengths: 

      (1) The study employs a broad range of bulk and single-cell transcriptional and epigenetic tools, including RNA-seq, ATAC-seq, ChIP-seq, and DNA methylation analysis. This comprehensive approach provides a detailed examination of the epigenetic landscape during the TFH-to-TR1 transition. 

      (2) The use of high-throughput sequencing technologies and sophisticated bioinformatics analyses strengthens the foundation for the conclusions drawn. 

      (3) The data generated can serve as a valuable resource for the scientific community, offering insights into the epigenetic regulation of T-cell plasticity. 

      (4) The findings have significant implications for developing new therapeutic strategies for autoimmune diseases, making the research highly relevant and impactful. 

      We thank the reviewer for providing constructive feedback on the manuscript.

      Weaknesses: 

      (1) While the scope of this study lies in transcriptional and epigenetic analyses, the conclusions need to be validated by future functional analyses. 

      We fully agree with the reviewer’s suggestion. The current study provides a foundational understanding of how the epigenetic landscape of TFH cells evolves as they transdifferentiate into TR1 progeny in response to chronic ligation of cognate TCRs using pMHCII-NPs. Functional validation is indeed the focus of our current studies, where we are carrying out extensive perturbation studies of the TFH-TR1 transdifferentiation pathway in conditional transcription factor gene knock-out mice. In these ongoing studies, genes coding for a series of transcription factors expressed along the TFH-TR1 pathway are selectively knocked out in T cells, to ascertain (i) the specific roles of key transcription factors in the various cell conversion events and transcriptional changes that take place along the TFH-TR1 cell axis; (ii) the roles that such transcription factors play in the chromatin re-modeling events that underpin the TFH-TR1 transdifferentiation process; and (iii) the effects of transcription factor gene deletion on phenotypic and functional readouts of TFH and regulatory T cell function.

      (2) This study successfully identified key transcription factors and epigenetic marks. How these factors mechanistically drive chromatin closure and gene expression changes during the TFH-to-TR1 transition requires further investigation. 

      Agreed. Please see our response to point #1 above.  

      (3) The study provides a snapshot of the epigenetic landscape. Future dynamic analysis may offer more insights into the progression and stability of the observed changes. 

      We have previously shown that the first event in the pMHCII-NP-induced TFH-TR1 transdifferentiation process involves proliferation of cognate TFH cells in the splenic germinal centers. This event is followed by immediate conversion of the proliferated TFH cells into transitional and terminally differentiated TR1 subsets. Although the snapshot provided by our single cell studies reported herein documents the simultaneous presence of the different subsets composing the TFH-TR1 cell pathway upon the termination of treatment, the transdifferentiation process itself is extremely fast, such that proliferated TFH cells already transdifferentiate into TR1 cells after a single pMHCII-NP dose (Sole et al., 2023a). This makes it extremely challenging to pursue dynamic experiments. Notwithstanding this caveat, ongoing studies of cognate T cells post treatment withdrawal, coupled to single cell studies of the TFHTR1 pathway in transcription factor gene knockout mice exhibiting perturbed transdifferentiation processes are likely to shed light into the progression and stability of the epigenetic changes reported herein. 

      We will revise the manuscript accordingly, to address the three concerns raised by the reviewer, in the context of the ongoing studies mentioned above. 

      Reviewer #2 (Public Review): 

      Summary: 

      This study, based on their previous findings that TFH cells can be converted into TR1 cells, conducted a highly detailed and comprehensive epigenetic investigation to answer whether TR1 differentiation from TFH is driven by epigenetic changes. Their evidence indicated that the downregulation of TFH-related genes during the TFH to TR1 transition depends on chromatin closure, while the upregulation of TR1-related genes does not depend on epigenetic changes. 

      Strengths: 

      (1) A significant advantage of their approach lies in its detailed and comprehensive assessment of epigenetics. Their analysis of epigenetics covers chromatin open regions, histone modifications, DNA methylation, and using both single-cell and bulk techniques to validate their findings. As for their results, observations from different epigenetic perspectives mutually supported each other, lending greater credibility to their conclusions. This study effectively demonstrates that (1) the TFH-to-TR1 differentiation process is associated with massive closure of OCRs, and (2) the TR1-poised epigenome of TFH cells is a key enabler of this transdifferentiation process. Considering the extensive changes in epigenetic patterns involved in other CD4+ T lineage commitment processes, the similarity between TFH and TR1 in their epigenetics is intriguing. 

      (2) They performed correlation analysis to answer the association between "pMHC-NPinduced epigenetic change" and "gene expression change in TR1". Also, they have made their raw data publicly available, providing a comprehensive epigenomic database of pMHC-NPinduced TR1 cells. This will serve as a valuable reference for future research. 

      We thank the reviewer for his/her constructive feedback and suggestions for improvement of the manuscript.

      Weaknesses: 

      (1) A major limitation is that this study heavily relies on a premise from the previous studies performed by the same group on pMHC-NP-induced T-cell responses. This significantly limits the relevance of their conclusion to a broader perspective. Specifically, differential OCRs between Tet+ and naïve T cells were limited to only 821, as compared to 10,919 differential OCRs between KLH-TFH and naïve T cells (Figure 2A), indicating that the precursors and T cell clonotypes that responded to pMHC-NP were extremely limited. This limitation should be clearly discussed in the Discussion section. 

      We agree that this study focuses on a very specific, previously unrecognized pathway discovered in mice treated with pMHCII-NPs. Despite this apparent narrow perspective, we now have evidence that this is a naturally occurring pathway that also develops in other contexts (i.e., in mice that have not been treated with pMHCII-NPs). Furthermore, this pathway affords a unique opportunity to further understand the transcriptional and epigenetic mechanisms underpinning T cell plasticity; the findings reported here can help guide/inform not only upcoming translational studies of pMHCII-NP therapy in humans, but also other research in this area. We will discuss the limitations and opportunities that this research provides more explicitly in a revised manuscript to provide a clearer context for the scope and applicability of our findings.

      We acknowledge that, in the bulk ATAC-seq studies, the differences in the number of OCRs found in tetramer+ cells or KLH-induced TFH cells vs. naïve T cells may be influenced by the intrinsic oligoclonality of the tetramer+ T cell pool arising in response to repeated pMHCII-NP challenge (Sole et al., 2023a). However, we note that scATAC-seq studies of the tetramer+ T cell pool found similar differences between the oligoclonal tetramer+ TFH subpool and its (also oligoclonal) tetramer+ TR1 counterparts (i.e., substantially higher number of OCRs in the former vs. the latter relative to naïve T cells). This will be clarified in a revised version of the manuscript.

      (2) This article uses peak calling to determine whether a region has histone modifications, claiming that the regions with histone modifications in TFH and TR1 are highly similar. However, they did not discuss the differences in histone modification intensities measured by ChIP-seq. For example, as shown in Figure 6C, IL10 H3K27ac modification in Tet+ cells showed significantly higher intensity than KLH-TFH, while in this article, it may be categorized as "possessing same histone modification region". This will strengthen their conclusions.

      We appreciate your suggestion to discuss differences in histone modification intensities as measured by ChIP-seq. However, we respectfully disagree with the reviewer’s interpretation of these data.

      Our study primarily focuses on the identification of epigenetic similarities and differences between pMHCII-NP-induced tetramer+ cells and KLH-induced TFH cells relative to naive T cells. The outcome of direct comparisons of histone deposition (ChIP-seq) between these cell types is summarized in the lower part of Figure 4B and detailed in Datasheet 5. Throughout this section, we report the number of differentially enriched regions, their overlap with OCRs shared between tetramer+ TFH and tetramer+ TR1 cells based on scATAC-seq data, and the associated genes. Clearly, most of the epigenetic modifications that TR1 cells inherit from TFH cells had already been acquired by TFH cells upon differentiation from naïve T cell precursors. 

      Regarding the specific point raised by the reviewer on differences in the intensity of the H3K27Ac peaks linked to Il10 in Figure 6C, we note that the genomic tracks shown are illustrative. However, thorough statistical analyses involving signal background for each condition and p-value adjustment did not support differential enrichment for H3K27Ac deposition around the Il10 gene between pMHCII-NP-induced tetramer+ T cells and KLHinduced TFH cells. 

      We acknowledge that peak calling alone does not account for intensity variations of histone modifications. However, our analysis includes both qualitative and quantitative assessments to ensure robust conclusions. We will edit the relevant sections of the manuscript to clarify these points and better communicate our methodology and findings to the readers.

      (3) Last, the key findings of this study are clear and convincing, but some results and figures are unnecessary and redundant. Some results are largely a mere confirmation of the relationship between histone marks and chromatin status. I propose to reduce the number of figures and text that are largely confirmatory. Overall, I feel this paper is too long for its current contents. 

      We understand this reviewer’s concern about the potential redundancy of some results and figures. The goal of including these analyses is to provide a comprehensive understanding of the intricate relationships between epigenetic features and transcriptomic differences. We believe that a detailed examination of these relationships is crucial for several reasons: (i) the breadth of the data allows for a thorough exploration of the relationships between histone marks, chromatin accessibility and transcriptional differences. This comprehensive approach helps ensure that our conclusions are robust and well-supported by the data; (ii) some of the results that may appear confirmatory are, in fact, important for validating and reinforcing the consistency of our findings across different contexts. These details intend to provide a nuanced understanding of the interactions between epigenetic features and gene expression; and (iii) by presenting a detailed analysis, we aim to offer a solid foundation for future research in this area. The extensive datasets that are presented in this paper will serve as a valuable resource for others in the field who may seek to build upon our findings.

      That said, we will carefully review the manuscript to identify and streamline any elements that may be overly redundant. We will consider consolidating figures and refining the text to ensure that the paper remains concise and focused while retaining the depth of analysis that we believe is essential.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      (1) In the intro, it seems to me that the multiple-demand (MD) regions are the key in this study. However, I didn't see any results associated with the MD regions. Did I miss something?

      Thank you to the reviewer for pointing this out. After careful consideration, we agree with your point of view. According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. This suggests that hMT+ does have the potential to become the core of MD system. However, due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated through the frontal cortex”, it is not yet sufficient to prove that hMT+is the core node of the MD system, we have adjusted the explanatory logic of the article. Briefly, we emphasize the de-redundancy of hMT+ in visual-spatial intelligence and the improvement of information processing efficiency, while weaken the significance of hMT+ in MD systems.

      (2) How was the sample size determined? Is it sufficient?

      Thank you to reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has reasonable power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 datasets to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes an enough dataset.

      (3) In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank reviewer for pointing this out. There are several differences between us:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are describe in reviewer 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (4) Basically this study contains the data of SI, BDT, GABA in MT+ and V1, Glu in MT+ and V1-all 6 measurements. There should be 6x5/2 = 15 pairwise correlations. However, not all of these results are included in Figure 1 and supplementary 1-3. I understand that it is not necessary to include all figures. But I suggest reporting all values in one Table.

      We thank the reviewer for the good suggestion, we have made a correlation matrix to reporting all values in Figure Supplementary 9.

      (5) In Melnick (2013), the IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used the visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III?

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.

      (6) In the functional connectivity part, there is no explanation as to why only the left MT+ was set to the seed region. What is the problem with the right MT+?

      We thank the reviewer for pointing this out. The main reason is that our MRS ROI is the left hMT+, we would like to make different models’ ROI consistent to each other. Use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      (7) In Melnick (2013), the authors also reported the correlation between IQ and absolute duration thresholds of small and large stimuli. Please include these analyses as well.

      We thank the reviewer for the good advice. Containing such result do help researchers compare the result between Melnick and us. We have made such figures in the revised version (Figure 3f, g).

      Reviewer #2 (Public Review):

      Summary:

      Recent studies have identified specific regions within the occipito-temporal cortex as part of a broader fronto-parietal, domain-general, or "multiple-demand" (MD) network that mediates fluid intelligence (gF). According to the abstract, the authors aim to explore the mechanistic roles of these occipito-temporal regions by examining GABA/glutamate concentrations. However, the introduction presents a different rationale: investigating whether area MT+ specifically, could be a core component of the MD network.

      Strengths:

      The authors provide evidence that GABA concentrations in MT+ and its functional connectivity with frontal areas significantly correlate with visuo-spatial intelligence performance. Additionally, serial mediation analysis suggests that inhibitory mechanisms in MT+ contribute to individual differences in a specific subtest of the Wechsler Adult Intelligence Scale, which assesses visuo-spatial aspects of gF.

      Weaknesses:

      (1) While the findings are compelling and the analyses robust, the study's rationale and interpretations need strengthening. For instance, Assem et al. (2020) have previously defined the core and extended MD networks, identifying the occipito-temporal regions as TE1m and TE1p, which are located more rostrally than MT+. Area MT+ might overlap with brain regions identified previously in Fedorenko et al., 2013, however the authors attribute these activations to attentional enhancement of visual representations in the more difficult conditions of their tasks. For the aforementioned reasons, It is unclear why the authors chose MT+ as their focus. A stronger rationale for this selection is necessary and how it fits with the core/extended MD networks.

      We really appreciate reviewer’s opinions. The reason why we focus on hMT+ is following: According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with high correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. In addition, Fedorenko et al. 2013, the averaged MD activity region appears to overlap with hMT+. Based on these findings, we assume that hMT+ does have the potential to become the core of MD system.

      (2) Moreover, although the study links MT+ inhibitory mechanisms to a visuo-spatial component of gF, this evidence alone may not suffice to position MT+ as a new core of the MD network. The MD network's definition typically encompasses a range of cognitive domains, including working memory, mathematics, language, and relational reasoning. Therefore, the claim that MT+ represents a new core of MD needs to be supported by more comprehensive evidence.

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. Due to our results only delving into visuo-spatial intelligence, it is not yet sufficient to prove that hMT is the core node of the MD system. We will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript aims to understand the role of GABA-ergic inhibition in the human MT+ region in predicting visuo-spatial intelligence through a combination of behavioral measures, fMRI (for functional connectivity measurement), and MRS (for GABA/glutamate concentration measurement). While this is a commendable goal, it becomes apparent that the authors lack fundamental understanding of vision, intelligence, or the relevant literature. As a result, the execution of the research is less coherent, dampening the enthusiasm of the review.

      Strengths:

      (1) Comprehensive Approach: The study adopts a multi-level approach, i.e., neurochemical analysis of GABA levels, functional connectivity, and behavioral measures to provide a holistic understanding of the relationship between GABA-ergic inhibition and visuo-spatial intelligence.

      (2) Sophisticated Techniques: The use of ultra-high field magnetic resonance spectroscopy (MRS) technology for measuring GABA and glutamate concentrations in the MT+ region is a recent development.

      Weaknesses:

      Study Design and Hypothesis

      (1) The central hypothesis of the manuscript posits that "3D visuo-spatial intelligence (the performance of BDT) might be predicted by the inhibitory and/or excitation mechanisms in MT+ and the integrative functions connecting MT+ with the frontal cortex." However, several issues arise:

      (1.1) The Suppression Index depicted in Figure 1a, labeled as the "behavior circle," appears irrelevant to the central hypothesis.

      We thank the reviewer for pointing this out. In our study, the inhibitory mechanisms in hMT+ are conceptualized through two models: the neurotransmitter model and the behavioral model. The Suppression Index is essential for elucidating the local inhibitory mechanisms within the behavioral model. However, we acknowledge that our initial presentation in the introduction may not have clearly articulated our hypothesis, potentially leading to misunderstandings. We have revised the introduction to better clarify these connections and ensure the relevance of the Suppression Index is comprehensively understood.

      (1.2) The construct of 3D visuo-spatial intelligence, operationalized as the performance in the Block Design task, is inconsistently treated as another behavioral task throughout the manuscript, leading to confusion.

      We thank the reviewer for pointing this out. We acknowledge that our manuscript may have inconsistently presented this construct across different sections, causing confusion. To address this, we ensured a consistent description of 3D visuo-spatial intelligence in both the introduction and the discussion sections. But we maintained ‘Block Design task score' within the results section to help readers clarify which subtest we use.

      (1.3) The schematics in Figure 1a and Figure 6 appear too high-level to be falsifiable. It is suggested that the authors formulate specific and testable hypotheses and preregister them before data collection.

      We thank the reviewer for pointing this out. We have revised the Figure 1a and made it less abstract and more logical. For Figure 6, the schematic represents our theoretical framework of how hMT+ contributes to 3D visuo-spatial intelligence, we believe the elements within this framework are grounded in related theories and supported by evidence discussed in our results and discussions section, making them specific and testable.

      (2) Central to the hypothesis and design of the manuscript is a misinterpretation of a prior study by Melnick et al. (2013). While the original study identified a strong correlation between WAIS (IQ) and the Suppression Index (SI), the current manuscript erroneously asserts a specific relationship between the block design test (from WAIS) and SI. It should be noted that in the original paper, WAIS comprises Similarities, Vocabulary, Block design, and Matrix reasoning tests in Study 1, while the complete WAIS is used in Study 2. Did the authors conduct other WAIS subtests other than the block design task?

      Thank you for pointing this out. Reviewer #1 also asked this question, we copy the answers in here “The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.”

      (3) Additionally, there are numerous misleading references and unsubstantiated claims throughout the manuscript. As an example of misleading reference, "the human MT ... a key region in the multiple representations of sensory flows (including optic, tactile, and auditory flows) (Bedny et al., 2010; Ricciardi et al., 2007); this ideally suits it to be a new MD core." The two references in this sentence are claims about plasticity in the congenitally blind with sensory deprivation from birth, which is not really relevant to the proposal that hMT+ is a new MD core in healthy volunteers.

      Thank you for pointing this out. We have carefully read the corresponding references and considered the corresponding theories and agree with these comments. Due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+ is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems. In addition, regarding the potential central role of hMT+ in the MD system, we agree with your view that research on hMT+ as a multisensory integration hub mainly focuses on developmental processes. Meanwhile, in adults, the MST region of hMT+ is considered a multisensory integration area for visual and vestibular inputs, which potentially supports the role of hMT+ in multitasking multisensory systems (Gu et al., J. Neurosci, 26(1), 73–85, 2006; Fetsch et al., Nat. Neurosci, 15, 146–154, 2012.). Further research could explore how other intelligence sub-ability such as working memory and language comprehension are facilitated by hMT+'s features.

      Another example of unsubstantiated claim: the rationale for selecting V1 as the control region is based on the assertion that "it mediates the 2D rather than 3D visual domain (Born & Bradley, 2005)". That's not the point made in the Born & Bradley (2005) paper on MT. It's crucial to note that V1 is where the initial binocular convergence occurs in cortex, i.e., inputs from both the right and left eyes to generate a perception of depth.

      Thank you for pointing this out. We acknowledge the inappropriate citation of "Born & Bradley, 2005," which focuses solely on the structure and function of the visual area MT. However, we believe that choosing hMT+ as the domain for 3D visual analysis and V1 as the control region is justified. Cumming and DeAngelis (Annu Rev Neurosci, 24:203–238.2001) state that binocular disparity provides the visual system with information about the three-dimensional layout of the environment, and the link between perception and neuronal activity is stronger in the extrastriate cortex (especially MT) than in the primary visual cortex. This supports our choice and emphasizes the relevance of hMT+ in our study. We have revised our reference in the revised version.

      Results & Discussion

      (1) The missing correlation between SI and BDT is crucial to the rest of the analysis. The authors should discuss whether they replicated the pattern of results from Melnick et al. (2013) despite using only one WAIS subtest.

      We thank for the reviewer’s suggestion. We have placed it in the main text (Figure 3e).

      (2) ROIs: can the authors clarify if the results are based on bilateral MT+/V1 or just those in the left hemisphere? Can the authors plot the MRS scan area in V1? I would be surprised if it's precise to V1 and doesn't spread to V2/3 (which is fine to report as early visual cortex).

      We thank for the reviewer’s suggestion. We have drawn the V1 ROI MRS scanning area (Figure supplement 1). Using the template, we checked the coverage of V1, V2, and V3. Although the MRS overlap regions extend to V2 (3%) and V3 (32%), the major coverage of the MRS scanning area is in V1, with 65% overlap across subjects.

      (3) Did the authors examine V1 FC with either the frontal regions and/or whole brain, as a control analysis? If not, can the author justify why V1 serves as the control region only in the MRS but not in FC (Figure 4) or the mediation analysis (Figure 5)? That seems a little odd given that control analyses are needed to establish the specificity of the claim to MT+

      We thank for the reviewer’s suggestion. We have done the V1 FC-behavior connection as control analysis (Figure supplement 7). Only positive correlations in the frontal area were detected, suggesting that in the 3D visuo-spatial intelligence task, V1 plays a role in feedforward information processing. However, hMT+, which showed specific negative correlations in the frontal, is involved in the inhibition mechanism. These results further emphasize the de-redundancy function of hMT+ in 3D visuo-spatial intelligence.

      Regarding the mediation analysis, since GABA/Glu concentration in V1 has no correlation with BDT score, it is not sufficient to apply mediation analysis.

      (4) It is not clear how to interpret the similarity or difference between panels a and b in Figure 4.

      We thank the reviewer for pointing this out. We have further interpreted the difference between a and b in the revised version. Panels a represents BDT score correlated hMT+-region FC, which is obviously involved in frontal cortex. While panels b represents SI correlated hMT+-region FC, which shows relatively less regions. The overlap region is what we are interested in and explain how local inhibitory mechanisms works in the 3D visuo-spatial intelligence. In addition, we have revised Figure 4 and point out the overlap region.

      (5) SI is not relevant to the authors‘ priori hypothesis, but is included in several mediation analyses. Can the authors do model comparisons between the ones in Figure 5c, d, and Figure S6? In other words, is SI necessary in the mediation model? There seem discrepancies between the necessity of SI in Figures 5c/S6 vs. Figure 5d.

      We thank the reviewer for highlighting this point. The relationship between the Suppression Index (SI) and our a priori hypotheses is elaborated in the response to reviewer 3, section (1). SI plays a crucial role in explicating how local inhibitory mechanisms, on the psychological level, function within the context of the 3D visuo-spatial task. Additionally, Figure 5c illustrates the interaction between the frontal cortex and hMT+, showing how the effects from the frontal cortex (BA46) on the Block Design Task are fully mediated by SI. This further underscores the significance of SI in our model.

      (6) The sudden appearance of "efficient information" in Figure 6, referring to the neural efficiency hypothesis, raises concerns. Efficient visual information processing occurs throughout the visual cortex, starting from V1. Thus, it appears somewhat selective to apply the neural efficiency hypothesis to MT+ in this context.

      We thank the reviewer for highlighting this point. There is no doubt that V1 involved in efficient visual information processing. However, in our result, the V1 GABA has no significant correlation between BDT score, suggesting that the V1 efficient processing might not sufficiently account for the individual differences in 3D visuo-spatial intelligence. Additionally, we will clarify our use of the neural efficiency hypothesis by incorporating it into the introduction of our paper to better frame our argument.

      Transparency Issues:

      (1) Don't think it's acceptable to make the claim that "All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary information". It is the results or visualizations of data analysis, rather than the raw data themselves, that are presented in the paper/supp info.

      We thank the reviewer for pointing this out. We realized that such expression would lead to confusion. We have deleted this expression.

      (2) No GitHub link has been provided in the manuscript to access the source data, which limits the reproducibility and transparency of the study.

      We thank the reviewer for pointing this out. We have attached the GitHub link in the revised version.

      Minor:

      "Locates" should be replaced with "located" throughout the paper. For example: "To investigate this issue, this study selects the human MT complex (hMT+), a region located at the occipito-temporal border, which represents multiple sensory flows, as the target brain area."

      We thank the reviewer for pointing this out. We have revised it.

      Use "hMT+" instead of "MT+" to be consistent with the term in the literature.

      We thank the reviewer for pointing this out. We agree to use hMT+ in the literature.

      "Green circle" in Figure 1 should be corrected to match its actual color.

      We thank the reviewer for pointing this out. We have revised it.

      The abbreviation for the Wechsler Adult Intelligence Scale should be "WAIS," not "WASI."

      We thank the reviewer for pointing this out. We have revised it.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The figures and tables should be substantially improved.

      We thank the reviewer for pointing this out. We have improved some of the figures’ quality.

      (2) Please explain the sample size, and the difference between Schallmo eLife 2018, and Melnick, 2013.

      We thank the reviewer for pointing this out. These questions are answered in the public review. We copy the answer in the public review.

      (2.1)  How was the sample size determined? Is it sufficient??

      Thank you to the reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has adequate power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 subjects to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes an enough dataset.

      (2.2)  In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank you to the reviewer for pointing this out. There are several differences between the two studies, ours and theirs:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are described in review 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (3) Table 1 and Table Supplementary 1-3 contain many correlation results. But what are the main points of these values? Which values do the authors want to highlight? Why are only p-values shown with significance symbols in Table Supplementary 2?

      (3.1) what are the main points of these values?

      Thank you to the reviewer for pointing this out. These correlations represent the relationship between behavior task (SI/BDT) and resting-state functional connectivity. It indicates that left hMT+ is involved in the efficient information integration network when it comes to the BDT task. In addition, left hMT+’s surround suppression is involved in several hMT+ - frontal connectivity. Furthermore, the overlapping regions between two tasks indicate a shared underlying mechanism.

      (3.2) Which values do the authors want to highlight?

      Table 1 and Table Supplementary 1-3 present the preliminary analysis results for Table 2 and Table Supplementary 4-6. So, we generally report all value. Conversely, in the Table 2 and Table Supplementary 4-6, we highlight (bold font) indicating the significant correlations survived from multi correlation correction.

      (3.3) Why are only p-values shown with significance symbols in Table Supplementary 2?

      Thank you for pointing this out, it is a mistake. We have revised it and delete the significance symbols.

      (4) Line 27, it is unclear to me what is "the canonical theory".

      We thank the reviewer for pointing this out. We have revised “the canonical theory" to “the prevailing opinion”.

      (5) Throughout the paper, the authors use "MT+", I would suggest using "hMT+" to indicate the human MT complex, and to be consistent with the human fMRI literature.

      We thank the reviewer for pointing this out. We have revised them and used "hMT+" to be consistent with the human fMRI literature.

      (6) At the beginning of the results section, I suggest including the total number of subjects. It is confusing what "31/36 in MT+, and 28/36 in V1" means.

      We thank the reviewer for pointing this out. We have included the total number of subjects in the beginning of result section.

      (7) Line 138, "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area". This sentence is strange because it is a well-established finding in numerous human fMRI papers. I think the authors should be more specific about what this finding implies.

      We thank the reviewer for pointing this out. We have deleted the inappropriate sentence "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area".

      (8) There are no unit labels for all x- and y-axies in Figure 1. I only see the unit for Conc is mmol per kg wet weight.

      We thank the reviewer for pointing this out. Figure 1 is a schematic and workflow chart, so labels for x- and y-axes are not needed. I believe this confusion might pertain to Figure 3. In Figures 3a and 3b, the MRS spectrum does not have a standard y-axis unit as it varies based on the individual physical conditions of the scanner; it is widely accepted that no y-axis unit is used. While the x-axis unit is ppm, which indicate the chemical shift of different metabolites. In Figure 3c, the BDT represents IQ scores, which do not have a standard unit. Similarly, in Figures 3d and 3e, the Suppression Index does not have a standard unit.

      (9) Although the correlations are not significant in Figure Supplement 2&3, please also include the correlation line, 95% confidence interval, and report the r values and p values (i.e., similar format as in Figure 1C).

      We thank the reviewer for pointing this out. We have revised them.

      (10) There is no need to separate different correlation figures into Figure Supplementary 1-4. They can be combined into the same figure.

      We thank the reviewer for the suggestion. However, each correlation figure in the supplementary figures has its own specific topic and conclusion. The correlation figures in Supplementary Figure 1 indicate that GABA in V1 does not show any correlation with BDT and SI, illustrating that inhibition in V1 is unrelated to both 3D visuo-spatial intelligence and motion suppression processing. The correlations in Supplementary Figure 2 indicate that the excitation mechanism, represented by Glutamate concentration, does not contribute to 3D visuo-spatial intelligence in either hMT+ or V1. Supplementary Figure 3 validates our MRS measurements. Supplementary Figure 4 addresses potential concerns regarding the impact of outliers on correlation significance. Even after excluding two “outliers” from Figures 3d and 3e, the correlation results remain stable.

      (11) Line 213, as far as I know, the study (Melnick et al., 2013) is a psychophysical study and did not provide evidence that the spatial suppression effect is associated with MT+.

      We thank the reviewer for pointing this out. It was a mistake to use this reference, and we have revised it accordingly.

      (12) At the beginning of the results, I suggest providing more details about the motion discrimination tasks and the measurement of the BDT.

      We thank the reviewer for pointing this out. We have included some brief description of task at the beginning of the result section.

      (13) Please include the absolute duration thresholds of the small and large sizes of all subjects in Figure 1.

      We thank the reviewer for the suggestion. We have included these results in Figure 3.

      (14) Figure 5 is too small. The items in plot a and b can be barely visible.

      We thank the reviewer for pointing this out. We increase the size and resolution of Figure 5.

      Reviewer #2 (Recommendations For The Authors):

      Recommendations for improving the writing and presentation.

      I highly recommend editing the manuscript for readability and the use of the English language. I had significant difficulties following the rationale of the research due to issues with the way language was used.

      We thank the reviewer for pointing this out. We apologize for any shortcomings in our initial presentation. We have invited a native English speaker to revise our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      Heer and Sheffield used 2 photon imaging to dissect the functional contributions of convergent dopamine and noradrenaline inputs to the dorsal hippocampus CA1 in head-restrained mice running down a virtual linear path. Mice were trained to collect water rewards at the end of the track and on test days, calcium activity was recorded from dopamine (DA) axons originating in the ventral tegmental area (VTA, n=7) and noradrenaline axons from the locus coeruleus (LC, n=87) under several conditions. When mice ran laps in a familiar environment, VTA DA axons exhibited ramping activity along the track that correlated with distance to reward and velocity to some extent, while LC input activity remained constant across the track, but correlated invariantly with velocity and time to motion onset. A subset of recordings taken when the reward was removed showed diminished ramping activity in VTA DA axons, but no changes in the LC axons, confirming that DA axon activity is locked to reward availability. When mice were subsequently introduced to a new environment, the ramping to reward activity in the DA axons disappeared, while LC axons showed a dramatic increase in activity lasting 90 s (6 laps) following the environment switch. In the final analysis, the authors sought to disentangle LC axon activity induced by novelty vs. behavioral changes induced by novelty by removing periods in which animals were immobile and established that the activity observed in the first 2 laps reflected novelty-induced signal in LC axons.  

      Strengths:  

      The results presented in this manuscript provide insights into the specific contributions of catecholaminergic input to the dorsal hippocampus CA1 during spatial navigation in a rewarded virtual environment, offering a detailed analysis of the resolution of single axons. The data analysis is thorough and possible confounding variables and data interpretation are carefully considered.  

      Weaknesses:  

      Aspects of the methodology, data analysis, and interpretation diminish the overall significance of the findings, as detailed below.  

      The LC axonal recordings are well-powered, but the DA axonal recordings are severely underpowered, with recordings taken from a mere 7 axons (compared to 87 LC axons).

      Additionally, 2 different calcium indicators with differential kinetics and sensitivity to calcium changes (GCaMP6S and GCaMP7b) were used (n=3, n=4 respectively) and the data pooled. This makes it very challenging to draw any valid conclusions from the data, particularly in the novelty experiment. The surprising lack of novelty-induced DA axon activity may be a false negative. Indeed, at least 1 axon (axon 2) appears to be showing a novelty-induced rise in activity in Figure 3C. Changes in activity in 4/7 axons are also referred to as a 'majority' occurrence in the manuscript, which again is not an accurate representation of the observed data.  

      We appreciate the reviewer's detailed feedback regarding the analysis of VTA axons in our dataset. The relatively low sample size for VTA axons is due to their sparsity in the dCA1 region of the hippocampus and the inherent difficulty in recording from these axons. VTA axons are challenging to capture due to their low baseline fluorescence and long-range axon segments, resulting in a typical yield of only a single axon per field of view (FOV) per animal. In contrast, LC axons are more abundant in dCA1.

      To address the disparity in sample sizes between LC and VTA axons, we down-sampled the LC axons to match the number of VTA axons, repeating this process 1000 times to create a distribution. However, we acknowledge the reviewer's concern that the relatively low sample size for VTA axons might result in insufficient sampling of this population. Increasing the baseline expression of GCaMP to record from VTA axons requires several months, limiting our ability to quickly expand the sample size.

      In response to the reviewer's comments, we have added recordings from 2 additional VTA axons, increasing the sample size from 7 to 9. We re-analyzed all data from the familiar environment with n=9 VTA axons, comparing them to down-sampled LC axons as previously described. However, the additional axons were not recorded in the novel environment. We agree with the reviewer that the lack of novelty-induced DA axon activity may be a false negative. To address this, we have revised the description of our results to include the following sentence:

      “However, 1 VTA ROI showed an increase in activity immediately following exposure to novelty, indicating heterogeneity across VTA axons in CA1, and the lack of a novelty signal on average may be due to a small sample size.”

      Regarding the use of two different GCaMP constructs, we understand the reviewer's concern. We used GCaMP6s and GCaMP7b variants to determine if one would improve the success rate of recording from VTA axons. Given the long duration of these experiments and the low yield, we pooled the data from both GCaMP variants to increase statistical power. However, we recognize the importance of verifying that there are no differences in the signals recorded with these variants.

      With the addition of 2 VTA DA axons expressing GCaMP6s, we now have n=5 GCaMP6s and n=4 GCaMP7b VTA DA axons. This allowed us to compare the activity of the two sensors in the familiar environment. As shown in new Supplementary Figure 2, both sets of axons responded similarly to the variables measured: position in VR, time to motion onset, and animal velocity (although the GCaMP6s expressing axons showed stronger correlations). Since all LC axons recorded expressed GCaMP6s, we also specifically compared VTA GCaMP6s axons to LC GCaMP6s axons (Supp Fig. 3). Our conclusions remained consistent when comparing this subset of VTA axons to LC axons.

      Overall, our paper now includes comparisons of combined VTA axons (n=9) and separately the GCaMP6s-expressing VTA axons (n=5) with LC axons. Both datasets support our initial conclusions that VTA axons signal proximity to reward, while LC axons encode velocity and motion initiation in familiar environments.

      The authors conducted analysis on recording data exclusively from periods of running in the novelty experiment to isolate the effects of novelty from novelty-induced changes in behavior. However, if the goal is to distinguish between changes in locus coeruleus (LC) axon activity induced by novelty and those induced by motion, analyzing LC axon activity during periods of immobility would enhance the robustness of the results.  

      We appreciate the reviewer's insightful suggestion to analyze LC axon activity during periods of immobility to distinguish between changes induced by novelty and those induced by motion. This additional analysis would indeed strengthen our conclusions regarding the LC novelty signal.

      In response to this suggestion, we performed the same analysis as before, but focused on periods of immobility. Our findings indicate that following exposure to novelty, there was a significant increase in LC activity specifically during immobility. This supports the idea that LC axons produce a novelty signal that is independent of novelty-induced behavioral changes. The results of this analysis are now presented in new Supplementary Figure 5b

      The authors attribute the ramping activity of the DA axons to the encoding of the animals' position relative to reward. However, given the extensive data implicating the dorsal CA1 in timing, and the remarkable periodicity of the behavior, the fact that DA axons could be signalling temporal information should be considered.  

      This is an insightful comment regarding the potential role of VTA DA axons in signaling temporal information. We agree that VTA DA axons could indeed be encoding temporal information, as previous work from our lab has shown that these axons exhibit ramping activity when averaged by time to reward (Krishnan et al., 2022).

      To address this, we have now examined DA axon activity relative to time to reward, as shown in new Supplementary Figure 4. Our analysis confirms that these axons ramp up in activity relative to time to reward. Given the periodicity of our mice's behavior in these experiments, as the reviewer correctly points out, we are unable to distinguish between spatial proximity to reward and time to reward. We have added a sentence to our paper highlighting this limitation and stating that further experiments are necessary to differentiate these two variables.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      The authors should explain and justify the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments.  

      We appreciate the reviewer's insightful comment regarding the use of a longer linear track (3m, as opposed to 2m in the DAT-cre mice) in the LC axon recording experiments. The choice of a 3m track for LC axon recordings was made to align with a previous experiment from our lab (Dong et al., 2021), in which mice were exposed to a novel 3m track while CA1 pyramidal cell populations were recorded. In that study, we detailed the time course of place field formation within the novel track. Our current hypothesis is that LC axons signal novelty, and we aimed to investigate whether the time course of LC axon activity aligns with the time course of place field formation. This hypothesis, and the potential role of LC axons in facilitating plasticity for new place field formation, is further discussed in the Discussion section of our paper.

      For the VTA axon recordings, we utilized a 2m track, consistent with another recent study from our lab (Krishnan et al., 2022), where reward expectation was manipulated, and CA1 pyramidal cell populations were recorded. By matching the track length to this prior study, we aimed to explore how VTA dopaminergic inputs to CA1 might influence CA1 population dynamics along the track under conditions of varying reward expectations.

      We acknowledge that using different track lengths for LC and VTA recordings introduces a variable that could potentially confound direct comparisons. To address this, we normalized the track lengths for our LC versus VTA comparison analysis. This normalization allowed us to directly compare patterns of activity across the two types of axons by adjusting the data to a common scale, thereby ensuring that any observed differences or similarities are attributable to the intrinsic properties of the axons rather than differences in track lengths. By doing so, we could assess relative changes in activity levels at matched spatial bins.

      Although the experiences of the animals on the different track lengths are not identical, our observations suggest that LC and VTA axon signals are not majorly influenced by variations in track length. LC axons are associated with velocity and a pre-motion initiation signal, neither of which are affected by track length. VTA axons, which also correlate with velocity, can be compared to LC axon velocity signals because mice reach maximal velocity very quickly a long the track, well before the end of the 2m track. The range of velocities are therefore capture on both track lengths. While VTA axons exhibit ramping activity as they approach the reward zone—a signal potentially modulated by track length—LC axons do not show such ramping to reward signals. Thus, a comparison across different track lengths is justified for this aspect of our analysis.

      To further enhance the rigor of our comparisons between axon dynamics recorded on 2m and 3m tracks, we conducted an additional analysis plotting axon activity by time to reward and actual (un-normalized) distance from reward (Supplementary Figure 4). This analysis revealed very similar signals between the two sets of axons, supporting our initial conclusions.

      We thank the reviewer for raising this important point and hope that our detailed explanation and additional analysis address their concern.

      Krishnan, L.S., Heer, C., Cherian, C., Sheffield, M.E. Reward expectation extinction restructures and degrades CA1 spatial maps through loss of a dopaminergic reward proximity signal. Nat Commun 13, 6662 (2022).

      Dong, C., Madar, A. D. & Sheffield, M.E. Distinct place cell dynamics in CA1 and CA3 encode experience in new environments. Nat Commun 12, 2977 (2021).

      Reviewer #2 (Public Review):  

      Summary:  

      The authors used 2-photon Ca2+-imaging to study the activity of ventral tegmental area (VTA) and locus coeruleus (LC) axons in the CA1 region of the dorsal hippocampus in head-fixed male mice moving on linear paths in virtual reality (VR) environments.  

      The main findings were as follows:  

      - In a familiar environment, the activity of both VTA axons and LC axons increased with the mice's running speed on the Styrofoam wheel, with which they could move along a linear track through a VR environment.  

      - VTA, but not LC, axons showed marked reward position-related activity, showing a ramping-up of activity when mice approached a learned reward position.  

      - In contrast, the activity of LC axons ramped up before the initiation of movement on the Styrofoam wheel.  

      - In addition, exposure to a novel VR environment increased LC axon activity, but not VTA axon activity.  

      Overall, the study shows that the activity of catecholaminergic axons from VTA and LC to dorsal hippocampal CA1 can partly reflect distinct environmental, behavioral, and cognitive factors. Whereas both VTA and LC activity reflected running speed, VTA, but not LC axon activity reflected the approach of a learned reward, and LC, but not VTA, axon activity reflected initiation of running and novelty of the VR environment.  

      I have no specific expertise with respect to 2-photon imaging, so cannot evaluate the validity of the specific methods used to collect and analyse 2-photon calcium imaging data of axonal activity.  

      Strengths:  

      (1) Using a state-of-the-art approach to record separately the activity of VTA and LC axons with high temporal resolution in awake mice moving through virtual environments, the authors provide convincing evidence that the activity of VTA and LC axons projecting to dorsal CA1 reflect partly distinct environmental, behavioral and cognitive factors.  

      (2) The study will help a) to interpret previous findings on how hippocampal dopamine and norepinephrine or selective manipulations of hippocampal LC or VTA inputs modulate behavior and b) to generate specific hypotheses on the impact of selective manipulations of hippocampal LC or VTA inputs on behavior.  

      Weaknesses:  

      (1) The findings are correlational and do not allow strong conclusions on how VTA or LC inputs to dorsal CA1 affect cognition and behavior. However, as indicated above under Strengths, the findings will aid the interpretation of previous findings and help to generate new hypotheses as to how VTA or LC inputs to dorsal CA1 affect distinct cognitive and behavioral functions.  

      (2) Some aspects of the methodology would benefit from clarification.  

      First, to help others to better scrutinize, evaluate, and potentially to reproduce the research, the authors may wish to check if their reporting follows the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines for the full and transparent reporting of research involving animals (https://arriveguidelines.org/). For example, I think it would be important to include a sample size justification (e.g., based on previous studies, considerations of statistical power, practical considerations, or a combination of these factors). The authors should also include the provenance of the mice. Moreover, although I am not an expert in 2-photon imaging, I think it would be useful to provide a clearer description of exclusion criteria for imaging data.

      We thank the reviewer for helping us formalize the scientific rigor of our study. There are ten ARRIVE Guidelines and we have addressed most of them in our study already. However, there is an opportunity to add detail. We have listed below all ten points and how we have addressed each one (and point out any new additions):

      (1) Experimental design - we go into great depth explaining the experimental set-up, how we used the autofluorescent blebs as imaging controls, how we controlled for different sample sizes between the two populations, and the statistical tests used for comparisons. We also carefully accounted for animal behavior when quantifying and describing axon dynamics both in the familiar and novel environments.

      (2) Sample size - we state both the number of ROIs and mice for each analysis. We have now also added the number of mice we observed specific types of activity in. 

      (3) Inclusion/exclusion criteria - The following has now been added to the Methods section: Out of the 36 NET-Cre mice injected, 15 were never recorded from for either failing to reach behavioral criteria, or a lack of visible expression in axons. Out of the 54 DAT-Cre mice injected, imaging was never conducted in 36 of them for lack of expression or failing to reach behavioral criteria. Out of the remaining 21 NET-CRE, 5 were excluded for heat bubbles, z-drift, or bleaching, while 10 DAT-Cre were excluded for the same reasons. This was determined by visually assessing imaging sessions, followed by using the registration metrics output by suite2p. This registration metric conducted a PCA on the motion-corrected ROIs and plotted the first PC. If the PC drifted largely, to the point where no activity was apparent, the video was excluded from analysis. 

      (4) Randomization - Already included in the paper is a description of random downsampling of LC axons to make statistical comparisons with VTA axons. LC axons were selected pseudo-randomly (only one axon per imaging session) to match VTA sampling statistics. This randomization was repeated 1000 times and comparisons were made against this random distribution. 

      (5) Blinding-masking - no blinding/masking was conducted as no treatments were given that would require this. We will include this statement in the next version. 

      (6) Outcomes - We defined all outcomes measured, such as those related to animal behavior and axon signaling. 

      (7) Statistical methods - None of the reviewers had any issues regarding our description of statistical methods, which we described in great detail in this version of the paper. 

      (8) Experimental animals - We have now described that DAT- Cre mice were obtained through JAX labs, and NET-Cre mice were obtained from the Tonegawa lab (Wagatsuma et al. 2017). This was absent in the initial version of the paper.

      (9) Experimental procedure - Already listed in great detail in Methods section.

      (10) Results - Rigorously described in detail for behaviors and related axon dynamics.

      Wagatsuma, Akiko, Teruhiro Okuyama, Chen Sun, Lillian M. Smith, Kuniya Abe, and Susumu Tonegawa. “Locus Coeruleus Input to Hippocampal CA3 Drives Single-Trial Learning of a Novel Context.” Proceedings of the National Academy of Sciences 115, no. 2 (January 9, 2018): E310–16. https://doi.org/10.1073/pnas.1714082115.

      Second, why were different linear tracks used for studies of VTA and LC axon activity (from line 362)? Could this potentially contribute to the partly distinct activity correlates that were found for VTA and LC axons?  

      We thank the reviewer for pointing this out and giving us a chance to address it directly. A detailed response to this is written above for a similar comment from reviewer 1.

      Third, the authors seem to have used two different criteria for defining immobility. Immobility was defined as moving at <5 cm/s for the behavioral analysis in Figure 3a, but as <0.2 cm/s for the imaging data analysis in Figure 4 (see legends to these figures and also see Methods, from line 447, line 469, line 498)? I do not understand why, and it would be good if the authors explained this.  

      This is a typo leftover from before we converted velocity from rotational units of the treadmill to cm/s. This has now been corrected.

      (3) In the Results section (from line 182) the authors convincingly addressed the possibility that less time spent immobile in the novel environment may have contributed to the novelty-induced increase of LC axon activity in dorsal CA1 (Figure 4). In addition, initially (for the first 2-4 laps), the mice also ran more slowly in the novel environment (Figure 3aIII, top panel). Given that LC and VTA axon activity were both increasing with velocity (Figure 1F), reduced velocity in the novel environment may have reduced LC and VTA axon activity, but this possibility was not addressed. Reduced LC axon activity in the novel environment could have blunted the noveltyinduced increase. More importantly, any potential novelty-induced increase in VTA axon activity could have been masked by decreases in VTA axon activity due to reduced velocity. The latter may help to explain the discrepancy between the present study and previous findings that VTA neuron firing was increased by novelty (see Discussion, from line 243). It may be useful for the authors to address these possibilities based on their data in the Results section, or to consider them in their Discussion.  

      We appreciate the reviewer's insightful comment regarding the potential impact of decreased velocity on novelty responses in LC and VTA axons. The decreased velocity in the novel environment could lead to a diminished novelty response in LC axons and could mask a subtle novelty signal in VTA axons. We have now included the following points in our discussion:

      “In addition, as noted above, on average we did observe a velocity associated signal in VTA axons. When mice were exposed to the novel environment their velocity initially decreased. This would be expected to reduce the average signal across the VTA axon population relative to the higher velocity in the familiar environment. It is possible that this decrease could somewhat mask a subtle novelty induced signal in VTA axons. Therefore, additional experiments should be conducted to investigate the heterogeneity of these axons and their activity under different experimental conditions during tightly controlled behavior.”

      “As discussed above, the slowing down of animal behavior in the novel environment could have decreased LC axon activity and reduced the magnitude of the novelty signal we detected during running. The novelty signal we report here may therefore be an under estimate of it's magnitude under matched behavioral settings.”

      However, it is important to note that although VTA axons, on average, showed activity modulated by velocity in a familiar rewarded environment, this relationship was largely due to the activity of two VTA axons that were strongly modulated by velocity, indicating heterogeneity within the VTA axon population in dCA1. We have highlighted this point in the discussion. We also discuss that:

      “It is possible that some VTA DA inputs to dCA1 respond to novel environments, and the small number of axons recorded here are not representative of the whole population.”

      (4) Sensory properties of the water reward, which the mice may be able to detect, could account for reward-related activity of VTA axons (instead of an expectation of reward). Do the authors have evidence that this is not the case? Occasional probe trials, intermixed with rewarded trials, could be used to test for this possibility.  

      Mice receive their water reward through a water spout that is immobile and positioned directly in front of their mouth. Water delivery is triggered by a solenoid when the mice reach the end of the virtual track. Therefore, because the water spout is immobile and the water reward is not delivered until they reach the end of the track, there is nothing for the mice to detect during their run. We have added clarifications about the water spout to the Methods and Results sections, along with appropriate discussion points.

      Additionally, we note that the ramping activity of VTA axons is still present on the initial laps with no reward (Krishnan et al., 2022), indicating that this activity is not directly related to the presence or absence of water but is instead associated with the animal’s reward expectation.

      We thank the reviewer for raising this point and hope that these clarifications address their concern.

      Reviewer #3 (Public Review):  

      Summary:  

      Heer and Sheffield provide a well-written manuscript that clearly articulates the theoretical motivation to investigate specific catecholaminergic projections to dorsal CA1 of the hippocampus during a reward-based behavior. Using 2-photon calcium imaging in two groups of cre transgenic mice, the authors examine the activity of VTA-CA1 dopamine and LC-CA1 noradrenergic axons during reward seeking in a linear track virtual reality (VR) task. The authors provide a descriptive account of VTA and LC activities during walking, approach to reward, and environment change. Their results demonstrate LC-CA1 axons are activated by walking onset, modulated by walking velocity, and heighten their activity during environment change. In contrast, VTA-CA1 axons were most activated during the approach to reward locations. Together the authors provide a functional dissociation between these catecholamine projections to CA1. A major strength of their approach is the methodological rigor of 2-photon recording, data processing, and analysis approaches. These important systems neuroscience studies provide solid evidence that will contribute to the broader field of learning and memory. The conclusions of this manuscript are mostly well supported by the data, but some additional analysis and/or experiments may be required to fully support the author's conclusions.  

      Weaknesses:  

      (1) During teleportation between familiar to novel environments the authors report a decrease in the freezing ratio when combining the mice in the two experimental groups (Figure 3aiii). A major conclusion from the manuscript is the difference in VTA and LC activity following environment change, given VTA and LC activity were recorded in separate groups of mice, did the authors observe a similar significant reduction in freezing ratio when analyzing the behavior in LC and VTA groups separately?  

      In response to the comment regarding the freezing ratios during teleportation between familiar and novel environments, we have analyzed the freezing ratios and lap velocities of DAT-Cre and NET-Cre mice separately (Fig. 3Aiii). Our analysis shows that the mean lap velocities of both groups overlap in the familiar environment and significantly decrease on the first lap of the novel environment (Fig. 3iii, top). For subsequent laps, the velocities in both groups are not statistically significantly different from the familiar environment lap velocities.

      Freezing ratios also show a statistically significant decrease on the first lap of the novel environment compared to the familiar environment in both groups (Fig. 3iii, bottom). In the NETCRE mice, the freezing ratios remain statistically lower in subsequent laps, while in the DATCRE mice, the following laps show a similar trend but without statistical significance. This lack of statistical significance in the DAT-CRE mice is likely due to their already lower freezing ratios in the familiar environment. Overall, the data demonstrate similar behavioral responses in the two groups of mice during the switch from the familiar to the novel environment.

      (2) The authors satisfactorily apply control analyses to account for the unequal axon numbers recorded in the LC and VTA groups (e.g. Figure 1). However, given the heterogeneity of responses observed in Figures 3c, 4b and the relatively low number of VTA axons recorded (compared to LC), there are some possible limitations to the author's conclusions. A conclusion that LC-CA1 axons, as a general principle, heighten their activity during novel environment presentation, would require this activity profile to be observed in some of the axons recorded in most all LC-CA1 mice.

      We agree with the reviewer’s point. To address this issue, when downsampling LC axons to compare to VTA axons, we matched the sampling statistics of the VTA axons/mice by only selecting one LC axon from each mouse to match the VTA dataset.

      Additionally, we have now included the number of recording sessions and the number of mice in which we observed each type of activity. This information has been added to further clarify and support our conclusions.

      Additionally, if the general conclusion is that VTA-CA1 axons ramp activity during the approach to reward, it would be expected that this activity profile was recorded in the axons of most all VTA-CA1 mice. Can the authors include an analysis to demonstrate that each LC-CA1 mouse contained axons that were activated during novel environments and that each VTA-CA1 mouse contained axons that ramped during the approach to reward?  

      As above, we have now added the number of mice that had each activity type we report in the paper here.  

      (3) A primary claim is that LC axons projecting to CA1 become activated during novel VR environment presentation. However, the experimental design did not control for the presentation of a familiar environment. As I understand, the presentation order of environments was always familiar, then novel. For this reason, it is unknown whether LC axons are responding to novel environments or environmental change. Did the authors re-present the familiar environment after the novel environment while recording LC-CA1 activity?  

      While we did not vary the presentation order of familiar and novel environments, we recorded the activity of LC axons in some mice when exposed to a dark environment (no VR cues) prior to exposure to the familiar environment. Our analysis of this data demonstrates that LC axons are also active following abrupt exposure to the familiar environment.

      We have added a new figure showing this response (Supplementary Figure 5A) and expanded on our original discussion point that LC axon activity generally correlates with arousal, as this result also supports that interpretation.

      We thank the reviewer for highlighting this important consideration. It certainly helps with the interpretation regarding what LC axons generally encode.  

      >Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      In addition to what has been described in the public review, I have the following recommendations:  

      The sample size of DA axon recordings should be increased with the use of a single GCaMP for valid conclusions to be made about the lack of novelty-inducted activity in these axons.  

      We have increased the n of VTA GCaMP6s axons in the familiar environment by including two axons that were recorded in the familiar rewarded condition. We have also conducted an analysis comparing GCaMPs versus GCaMP7b, which is discussed in detail above.

      Regarding the concerns about valid conclusions of novelty-induced activity in VTA axons, we have added a comment in the discussion to tone down our conclusions regarding the lack of a novelty signal in the VTA axons. This valid concern is discussed in detail above.  

      The title is currently very generic, and non-informative. I recommend the use of more specific language in describing the type of behavior under investigation. It is not clear to the reviewer why 'learning' is included here.  

      Original title: “Distinct catecholaminergic pathways projecting to hippocampal CA1 transmit contrasting signals during behavior and learning”

      To make it more specific to the experiments conducted here, we have changed the title to this:

      New title: “Distinct catecholaminergic pathways projecting to hippocampal CA1 transmit contrasting signals during navigation in familiar and novel environments”

      Error noted in Figure 4C legend - remove reference to VTA ROIs.  

      The reference to VTA ROIs has been removed from the figure legend

      Reviewer #2 (Recommendations For The Authors):  

      (1) The concluding sentence of the Abstract could be more specific: which distinct types of information are reflected/'signaled'/'encoded' by LC and VTA inputs to dorsal CA1?  

      The abstract has been adjusted accordingly. The new sentence is more specific: “These inputs encode unique information, with reward information in VTA inputs and novelty and kinematic information in LC inputs, likely contributing to differential modulation of hippocampal activity during behavior and learning.”

      (2) Line 46/47: The study by Mamad et al. (2017) did not quite show that VTA dopamine input to dorsal CA1 'drives place preference'. To my understanding, the study showed that suppression of VTA dopamine signaling in a specific place caused avoidance of this place and that VTA dopamine signaling modulated hippocampal place-related firing. So, please consider rephrasing.  

      Corrected, thanks for pointing this out.

      (3) Legend to Figure 3AIII: 'Each lap was compared to the first lap in F . . .' Could you clarify if 'F' refers to the 'familiar environment?  

      Figure legend has been changed accordingly

      (4) Line 176: '36 LC neurons' - should this not be '36 imaged axon terminals in dorsal CA1' or something along these lines?  

      This reference has been changed to “LC axon ROIs”

      (5) Line 353: Why was water restriction started before the hippocampal window implant, if behavioral training to run for water reward only started after the implant? Please clarify.

      A sentence was added to the methods to explain that this was done to reduce bleeding and swelling during the hippocampal window implantation.  

      (6) Line 377: '. . . which took 10-14 days (although some mice never reached this threshold).' How many mice did not reach the criterion within 14 days? I think it is not accurate to say the mice 'never' reached the threshold, as they were only tested for a limited period of time.  

      We have added details of how many mice were excluded from each group and the reason why they were excluded.

      (7) Exclusion criteria for imaging data: The authors state (from line 402): 'Imaging sessions with large amounts of drift or bleaching were excluded from analysis (8 sessions for NET mice, 6 sessions for LC Mice).' What exactly were the quantitative exclusion criteria? Were these defined before the onset of the study or throughout the study?  

      Imaging sessions were first qualitatively assessed by looking for disappearance or movement of structures in the Z-plane throughout the imaging FOV. Additionally, following motion correction in suite2p, we used the registration metrics, which plots the first Principle Component of the motion corrected images, to assess for drift, bleaching, or heat bubbles. If this variable increased or decreased greatly throughout a session, to the point where any apparent activity was not visible in the first PC, the dataset was excluded. We have added these exclusion criteria to the methods section.

      Reviewer #3 (Recommendations For The Authors):  

      Please provide a justification or rationale for having two different criteria for immobility (< 5cm/sec) and freezing (<0.2 cm/sec). If VTA and LC axon activities are different between these two velocities, please provide some commentary on this difference.  

      This is a typo leftover from before we converted velocity from rotational units to cm/s.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewing editor’s list of items remaining to be addressed followed by our responses/actions:

      (1) The order and organization of supplemental figures and tables is almost impossible to navigate. Please put them in order. 

      All the sections from the previous Supplementary files have been divided into individual Supplementary files so that each can be referenced without confusion from the text. All of the references in the body of the text and the author responses have been updated to reflect this change.

      (2) The question of sample sizes was partially addressed, with authors stating that cell culture work in iPSCs and PGCLCs was done in replicates of 3. Sertoli and granulosa cells were generated from pooled preps - how many individuals, were they littermates? 

      Sertoli and granulosa primary cultures were generated from littermates and each prep used 5 animals (males for Sertoli cells and females for granulosa cells). These changes have been added to the body of the text on pages 39 and 40.

      (3) Authors need to discuss the limitations of doing work in triplicates. Their PCA (Supplement Figure 9) reveals that in several cases samples from the same treatment were not discriminated by PC1 and/or PC2. This is especially true in e and f, the variance of which was explained by PC1 for cell type, but for which treatments showed poor discrimination by PC2. Some discussion of the limitations of sample size should be provided.

      Additional text has been added to what is now Supplementary file 15 to acknowledge this limitation imposed by the limited number of replicates (three) and the ability to resolve the differences in treatments by PCA in subplots e and f. However, we also note that the differences were sufficient to identify significant DMCs/DMRs/DEGs.

      Reviwer 2 also noted a potential weakness that “exposures are more complicated in a whole organism than in an isolated cell line.”

      We note that in our revised manuscript we included wording noting that despite the advantages of using an in vitro approach to deduce underlying molecular mechanisms, results of such in vitro studies “ultimately warrant validation of results discerned from studies of in vitro models to ensure they also reflect functions ongoing in the more complex and heterogeneous environment of the intact animal in vivo.” Thus we have endeavored to acknowledge the reviewer’s point.

      Reviewer #1 (Public Review): 

      Critiques/Comments: 

      (1) A problem with in vitro work is that homogeneous cell lines/cultures are, by nature, absent from the rest of the microenvironment. The authors need to discuss this. 

      [Addressed on pages: 24-25] – We have added two sentences to the second paragraph of the Discussion section in which we now acknowledge this concern, but also point out that in vitro models of this sort also provide an experimental advantage in that they facilitate a deconvolution of the extensive complexity resident within the intact animal. Nevertheless, we acknowledge that this deconvolution requires ultimate validation of findings obtained within an in vitro model system to ensure they accurately recapitulate functions that occur in the intact animal in vivo.

      In response to Reviewer 2’s stated weakness of our study that “The weakness includes the fact that exposures are more complicated in a whole organism than in an isolated cell line,” please note that this added text includes the statement that despite the advantages of using an in vitro approach to deduce underlying molecular mechanisms, results of such in vitro studies “ultimately warrant validation of results discerned from studies of in vitro models to ensure they also reflect functions ongoing in the more complex and heterogeneous environment of the intact animal in vivo.” Thus we have endeavored to acknowledge the reviewer’s point.

      (2) What are n's/replicates for each study? Were the same or different samples used to generate the data for RNA sequencing, methylation beadchip analysis, and EM-seq? This clarification is important because if the same cultures were used, this would allow comparisons and correlations within samples.  

      Addressed on pages: 39-45 and in new Supplementary file 15 – Additional text has been added in the Methods section to indicate that all samples involving cell culture models which include iPSCs and PGCLCs came from a single XY iPS cell line aliquoted into replicates and all primary cultures which included Sertoli and granulosa cells were generated from pooled tissue preps from mice and then aliquoted into replicates. Finally, all experiments in the study were performed on three replicates. Because this experimental design did indeed allow for comparisons among samples, we have added a new Supplementary file 15

      which displays PCA plots showing clustering among control and treatment datasets, respectively, as well as distinctions between each cluster representing each experimental condition.

      (3) In Figure 1, it is interesting that the 50 uM BPS dose mainly resulted in hypermethylation whereas 100 uM appears to be mainly hypomethylation. (This is based on the subjective appearance of graphs). The authors should discuss and/or present these data more quantitatively. For example, what percentage of changes were hypo/hypermethylation for each treatment? How many DMRs did each dose induce? For the RNA-seq results, again, what were the number of up/down-regulated genes for each dose?  

      Addressed on pages: 6-7 and in new Supplementary files 1-3  – The experiment shown in Figure 1 was designed to 1) serve as proof of principle that cells maintained in culture could be susceptible to EDC-induced epimutagenesis at all, 2) determine if any response observed would be dose-dependent, and 3) identify a minimally effective dose of BPS to be used for the remaining experiments in this study (which we identified as 1 μM). We agree that it is interesting that the 50 µM dose of BPS induced predominantly hypermethylation changes whereas the 1 µM and 100 µM doses induced predominantly hypomethylation changes, but are not in a position to offer a mechanistic explanation for this outcome at this time. As the results shown satisfied our primary objectives of demonstrating that exposure of cells in culture to BPS could indeed induce DNA methylation epimutations, that this occurs in a dose-dependent manner, and that a dose of as low as 1 µM of BPS was sufficient to induce epimutagenesis, the data obtained satisfied all of the initial objectives of this experiment. That said, in response to the reviewer’s request we have now added text on pages 6-7 alluding to new Supplementary files 1-3 indicating the total number of DMCs and DMRs, as well as the number of DEGs, detected in response to exposure to each dose of BPS shown in Figure 1, as well as stratifying those results to indicate the numbers of hyper- and hypomethylation epimutations and up- and down-regulated DEGs induced in response to each dose of BPS. While, as noted above, investigating the mechanistic basis for the difference in responses induced by the 50 µM versus 1 and 100 µM doses of BPS was beyond the scope of the study presented in this manuscript, we do find this result reminiscent of the “U-shaped” response curves often observed in toxicology studies. Importantly, this result does demonstrate the elevated resolution and specificity of analysis facilitated by our in vitro cell culture model system.

      (4) Also in Figure 1, were there DMRs or genes in common across the doses? How did DMRs relate to gene expression results? This would be informative in verifying or refuting expectations that greater methylation is often associated with decreased gene expression.  

      Addressed on pages: 6-7 and new Supplementary files 1-6 – In general, we observed a coincidence between changes in DNA methylation and changes in gene expression (Supplementary files 1-3). Pertaining directly to the reviewer’s question about the extent to which we observed common DMRs and DEGs across all doses, while we only found 3 overlapping DMRs conserved across all doses tested, we did find an average of 51.25% overlap in DMCs and an average of 80.45% overlap in DEGs across iPSCs exposed to the different doses of BPS shown in Figure 1. In addition, within each dose of BPS tested in iPSCs, we also found that there was an overlap between DMCs and the promoters or gene bodies of many DEGs (Supplementary file 5). Specifically within gene promoters, we observed a correlation between hypermethylated DMCs and decreased gene expression and hypomethylated DMCs and increased gene expression, respectively (Supplementary file 6).

      (5) In Figure 2, was there an overlap in the hypo- and/or hyper-methylated DMCs? Please also add more description of the data in 2b to the legend including what the dot sizes/colors mean, etc. Some readers (including me) may not be familiar with this type of data presentation. Some of this comes up in Figure 4, so perhaps allude to this earlier on, or show these data earlier.  

      Addressed on pages: 8-9 and new Supplementary file 4 – We observed an average of 11.05% overlapping DMCs between different pairs of cell types, we did not observe any DMCs that were shared among all four cell types. Indeed, this limited overlap of DMCs among different cell types exposed to BPS was the primary motivation for the analysis described in Figure 2. Thus, instead of focusing solely on direct overlap between specific DMCs, we instead examined similarities among the different cell types tested in the occurrence of epimutations within different annotated genomic regions. To better describe this, we have now added additional text to page 9. We have also added more detail to the legend for Figure 2 on page 8 to more clearly explain the significance of the dot sizes and colors, explaining that the dot sizes are indicative of the relative number of differentially methylated probes that were detected within each specific annotated genomic region, and that the dot colors are indicative of the calculated enrichment score reflecting the relative abundance of epimutations occurring within a specific annotated genomic region. The relative score is calculated by iterating down the list of DMCs and increasing a running-sum statistic when encountering a DMC within the specific annotated genomic region of interest and decreasing the sum when the epimutation is not in that annotated region. The magnitude of the increment depends upon the relative occurrence of DMCs within a specific annotated genomic region.

      (6) iPSCs were derived from male mice MEFs, and subsequently used to differentiate into PGCLCs. The only cell type from an XX female is the granulosa cells. This might be important, and should be mentioned and its potential significance discussed (briefly).  

      Addressed on page: 29 – We have added a new paragraph just before the final paragraph of the Discussion section in which we acknowledge that most of the cell types analyzed during our study were XY-bearing “male” cells and that the manner in which XX-bearing “female” cells might respond to similar exposures could differ from the responses we observed in XY cells. However, we also noted that our assessment of XX-bearing granulosa cells yielded results very similar to those seen in XY Sertoli cells suggesting that, at least for differentiated somatic cell types, there does not appear to be a significant sex-specific difference in response to exposure to a similar dose of the same EDC. That said, we also acknowledged that in cell types in which dosage compensation based on X-chromosome inactivation is not in place, differences between XY- and XX-bearing cells could accrue.

      (7) EREs are only one type of hormone response element. The authors make the point that other mechanisms of BPS action are independent of canonical endocrine signaling. Would authors please briefly speculate on the possibility that other endocrine pathways including those utilizing AREs or other HREs may play a role? In other words, it may not be endocrine signaling independent. The statement that the differences between PGCLCs and other cells are largely due to the absence of ERs is overly simplistic.  

      Addressed on page: 11 and in a new Supplementary file 8  – Previous reports have indicated that BPS does not have the capacity to bind with the androgen receptor (Pelch et al., 2019; Yang et al., 2024). However there have been reports indicating that BPS can interact with other endocrine receptors including PPARγ and RXRα, which play a role in lipid accumulation and the potential to be linked to obesity phenotypes (Gao et al., 2020; Sharma et al., 2018). To address the reviewer’s comment we assessed the expression of a panel of hormone receptors including PPARγ, RXRα, and AR  in each of the cell types examined in our study and these results are now shown in a new Supplementary file 8. We show that in addition to not expressing either estrogen receptor (ERa or ERb), germ cells also do not express any of the other endocrine receptors we tested including AR, PPARγ, and RXRα. Thus we now note that these results support our suggestion that the induction of epimutations we observed in germ cells in response to exposure to BPS appears to reflect disruption of non-canonical endocrine signaling. We also note that non-canonical endocrine signaling is well established (Brenker et al., 2018; Ozgyin et al., 2015; Song et al., 2011; Thomas and Dong, 2006). Thus we feel the suggestion that the effects of BPS exposure could conceivably reflect either disruption of canonical or non-canonical signaling in any cell type is well justified and that our data suggests that both of these effects appear to have accrued in the cells examined in our study as suggested in the text of our manuscript.

      (8) Interpretation of data from the GO analysis is similarly overly simplistic. The pathways identified and discussed (e.g. PI3K/AKT and ubiquitin-like protease pathways) are involved in numerous functions, both endocrine and non-endocrine. Also, are the data shown in Figure 6a from all 4 cell types? I am confused by the heatmap in 6c, which genes were significantly affected by treatment in which cell types?  

      Addressed on pages: 19-21 – Per the reviewer’s request, we have added text to indicate that Figure 6a is indeed data from all four cell types examined. We have also modified the text to further clarify that Figure 6c displays the expression of other G-coupled protein receptors which are expressed at similar, if not higher, levels than either ER in all cell types examined, and that these have been shown to have the potential to bind to either 17β-estradiol or BPA in rat models. As alluded to by the reviewer, this is indicative of a wide variety of distinct pathways and/or functions that can potentially be impacted by exposure to an EDC such as BPS. Thus, we have attempted to acknowledge the reviewer’s primary point that BPS may interact with a variety of receptors or other factors involved with a wide variety of different pathways and functions. Importantly, this illustrates the strength of our model system in that it can be used to identify potential impacted target pathways that can then be subsequently pursued further as deemed appropriate.

      (9) In Figure 7, what were the 138 genes? Any commonalities among them? 

      Addressed on page: 22 and in a new Supplementary files 13 and 14 – We have now added a new supplemental Excel file (Supplementary file 13) that lists the 138 overlapping conserved DEGs that did not become reprogrammed/corrected during the transition from iPSCs to PGCLCs. In addition, we have added new text on page 22 and a new Supplementary file 14 which displays KEGG analysis of pathways associated with these 138 retained DEGs. We find that these genes are primarily involved with cell cycle and apoptosis pathways which, interestingly, have the potential to be linked to cancer development which is often linked to disruptions in chromatin architecture.

      (10) The Introduction is very long. The last paragraph, beginning line 105, is a long summary of results and interpretations that better fit in a Discussion section.

      Addressed on page: 6 – We have now significantly reduced the length and scope of the final paragraph of the Introduction per the reviewer’s recommendation.

      (11) Provide some details on husbandry: e.g. were they bred on-site? What food was given, and how was water treated? These questions are to get at efforts to minimize exposure to other chemicals.  

      Addressed on page: 37 – We have added additional text detailing that all mice used in the project were bred onsite, water was non-autoclaved conventional RO water, and our selection of 5V5R extruded feed for mice used in this study which was highly controlled for the presence of isoflavones and has been certified to be used for estrogen-sensitive animal protocols.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript uses cell lines representative of germ line cells, somatic cells, and pluripotent cells to address the question of how the endocrine-disrupting compound BPS affects these various cells with respect to gene expression and DNA methylation. They find a relationship between the presence of estrogen receptor gene expression and the number of DNA methylation and gene expression changes. Notably, PGCLCs do not express estrogen receptors and although they do have fewer changes, changes are nevertheless detected, suggesting a nonconical pathway for BPS-induced perturbations. Additionally, there was a significant increase in the occurrence of BPS-induced epimutations near EREs in somatic and pluripotent cell types compared to germ cells. Epimutations in the somatic and pluripotent cell types were predominantly in enhancer regions whereas that in the germ cell type was predominantly in gene promoters. 

      Strengths: 

      The strengths of the paper include the use of various cell types to address the sensitivity of the lineages to BPS as well as the observed relationship between the presence of estrogen receptors and changes in gene expression and DNA methylation. 

      Weaknesses: 

      The weaknesses include the lack of reporting of replicates, superficial bioinformatic analysis, and the fact that exposures are more complicated in a whole organism than in an isolated cell line. 

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors. 

      Reviewer #2 (Recommendations For The Authors): 

      Overall, this is an intriguing paper but more transparency in the replicates and methods and a more rigorous bioinformatic treatment of the data are required. 

      Specific comments: 

      (1) End of abstract "These results suggest a unique mechanism by which an EDC-induced epimutated state may be propagated transgenerationally following a single exposure to the causative EDC." This is overly speculative for an abstract. There is only epigenetic inheritance following mitosis or differentiation presented in this study. There is no meiosis and therefore no ability to assess multi- or transgenerational inheritance. 

      Addressed on page: 2 – We have modified the text at the end of the abstract to more precisely reflect our intended conclusions based on our data. In our view, the ability of induced epimutations to transcend meiosis per se is not as relevant to the mechanism of transgenerational inheritance as their ability to transcend major waves of epigenetic reprogramming that normally occur during development of the germ line. In this regard the transition from pluripotent iPSCs to germline PGCLCs has been shown to recapitulate at least the first portion of normal germline reprogramming, and now our data provide novel insight into the fate of induced epimutations during this process. Specifically, we show that a prevelance of epimutations was conserved during the iPSC à germ cell transition but that very few (< 5%) of the specific epimutations present in the the BPS-exposed iPSCs were retained when those cells were induced to form PGCLCs. Rather, we observed apparent correction of a large majority of the initially induced epimutations during this transition, but this was accompanied by the apparent de novo generation of novel epimutations in the PGCLCs. We suggest, based on other recent reports in the literature, that this is a result of the BPS exposure inducing changes in the chromatin architecture in the exposed iPSCs such that when the normal germline reprogramming mechanism is imposed on this disrupted chromatin template there is both correction of many existing epimutations and the genesis of many novel epimutations. This observation has the potential to explain the long-standing question of why the prevalence of epimutations persists across multiple generations despite the occurrence of epigenetic reprogramming during each generation. Nevertheless, as noted above, we have modified the text at the end of the abstract to temper this interpretation given that it is still somewhat speculative at this point.

      (2) Doses used in the experiments. One needs to be careful when stating that the dose used is "below FDA's suggested safe environmental level established for BPA" because a different bisphenol is being used here (BPA vs BPS) and the safe level is that which the entire organism experiences. It is likely that cell lines experience a higher effective dose.  

      Addressed on pages: 3, 5, and 26 – We have now made a point of noting that our reference to an EPA-recommended “safe dose” of BPA was for humans and/or intact animals. Changes to this effect have been made in the second and sixth paragraphs of the Introduction section. In addition, we have added text at the end of the fourth paragraph of the Discussion section acknowledging that, as the reviewer suggests, the same dose of an EDC could exert greater effects on cells in a homogeneous culture than on the same cell type within an intact animal given the potential for mitigating metabolic effects in the latter. However, we also note that the ability we demonstrated to quantify the effects of such exposures on the basis of numbers of epimutations (DMCs or DMRs) induced could potentially be used in future studies to study this question by assessing the effects of a specific dose of a specific EDC on a specific cell type when exposed either within a homogeneous culture or within an intact animal.

      (3) Figure 1: In the dose response, what was the overlap in DMCs and DEGs among the 3 doses? Are the responses additive, synergistic, or completely non-overlapping? This is an important point that should be addressed. 

      Addressed on page: 6-7 and in Supplementary files 1-5 – Please see our response to Reviewer 1 critique #4 above where we address similar concerns. While we do find overlap among different cell types with respect to the DMCs, DMRs, and DEGs displayed in Figure 1, we found the effect to be only partially additive as opposed to synergistic in any apparent manner. The fold increase in DMCs, DMRs, and DEGs resulting from exposure to doses of 1 μM or 50 μM ranged from 2.5x to 4.4x, which was well below the 50x increase that would have been expected from a strictly additive effect, and the effect increased even less, if at all, in response to exposure to doses of 50 μM versus 100 μM BPS. Finally, as now noted in the Discussion section on page 25, our conclusion is that these results display a limited dose-dependent effect that was partially additive but also plateaued at the highest doses tested.

      (4) Methods: How many times was each exposure performed on a given cell type? This information should be in the figure legends and methods. In the case of multiple exposures for a given line, do the biological replicates agree? 

      Addressed on pages: 39-45 and in new Supplementary file 15 –  Please see our response to Reviewer 1 critique #2 where we address similar concerns with newly added text and analysis. We now note repeatedly on pages 39-45 that each analysis was conducted on three replicate samples, and we display the similarity among those replicates graphically in a new Supplementary file 15.

      (5) DNA methylation analyses. Very little analysis is presented on the BeadChip array other than hypermethylated/hypomethylated and genomic regions of DMCs. What is the range of methylation changes? Does it vary between hypo vs. hyper DMCs? How many array experiments were performed (biological replicates) and what stats were used to determine the DMCs? Are there DMCs in common among the various cell types? As an example, if more meaningful analysis, one can plot the %5mC over a given array for comparisons between control and treated cell types. For more granularity, the %5mC can be presented according to the element type (enhancers vs promoters). 

      Addressed on pages: 10 and 39-45 and in new Supplementary files 1-5, 15 –  Please see our response to Reviewer 1 critique #2 above where we address similar concerns regarding the number of biological replicates used in this study. DMCs on the Infinium array are identified using mixed linear models. This general supervised learning framework identifies CpG loci at which differential methylation is associated with known control vs. treated co-variates. CpG probes on the array were defined as having differential changes that met both p-value and FDR (≤ 0.05) significant thresholds between treatment and control samples for each cell type analyzed. The range of medians across all samples was 0.0278 to 0.0059 for hypermethylated beta values and -0.0179 to -0.0033 for hypomethylated beta values. As noted above, we did observe an overlap in DMCs between cell types. Thus, we observed an average of 11.05% overlapping DMCs between two or more cell types but we did not observe any DMCs shared between all four cell types. We have added additional text on page 9 and new Supplementary files 1-5 to now more clearly describe that this limited similarity in direct overlap of DMCs was the underlying motivation for the analysis described in Figure 2. Finally, the enrichment dot plots shown in Figure 2 provide the information the reviewer requested regarding the %5mC observed at different annotated genomic element types.

      (6) The investigators correlate the number of DMCs in a given cell type with the presence of estrogen receptors. Does the correlation extend to the methylation difference (delta beta) at the statistically different probes?

      Addressed in a new Supplementary file 7 – We have added a new Supplementary file 7 in which we provide data addressing this question. In brief, we find that the delta betas of probes enriched at enhancer regions and associated with relative proximity to ERE elements in Sertoli cells, granulosa cells, and iPSCs appear very similar to those associated with DMCs not located within these enriched regions. However, when we compared the similarity of the two data sets with goodness of fit tests, we found these relatively small differences were, in fact, statistically significant based on a two-sample Kolmogorov-Smirnov test. These observed significant differences appear to indicate that there is higher variability among the delta betas associated with hypomethylated, but not hypermethylation changes occurring at DMCs associated with enhancers, potentially suggesting a greater tendency for exposure to BPS to induce hypomethylation rather than hypermethylation changes, at least in these specific regions.

      (7) Methylation changes relative to EREs are presented in multiple figures. Are other sequences enriched in the DMCs? 

      Addressed in a new Supplementary file 11. We profiled the genomic sequence within 500 bp of cell type-specific enriched DMCs that were either associated with enhancer regions in Sertoli, granulosa, or iPS cells or transcription factor binding sites in PGCLCs for the identification of higher abundance motif sequences. We then compared any motifs identified with the JASPAR database to potentially find transcription factors that could be binding to these regions. Interestingly we found that the two most common motifs across all cell types were associated with either the chromatin remodeling transcription factor HMG1A or the pluripotency factor KLF4.

      (8) Please present a correlation plot between the methylation differences and the adjacent DEGs. Again, the absence of consideration of the absolute changes in methylation and gene expression minimizes the impact of the data. 

      Addressed on pages 6, 7, and 17 and in a new Supplementary file 6 – We analyzed the relationship between DMCs at DEGs promoter regions and the corresponding change in expression of that DEG. Our data support a relationship between up-regulated genes showing decreased methylation in promoter regions and down-regulated genes showing increased methylation at promoter regions, although there were some exceptions to this relationship.

      (9) EM-Seq is mentioned in Figure 7 and in the material and methods. Where is it used in this study? 

      Addressed on page 22 – We now note in the text on page 22 that EM-seq was used during experiments assessing the propagation of BPS-induced epimutations during the iPSC à EpiLC à PGCLC cell state transitions to gather higher resolution data of changes to DNA methylation differences at the whole-epigenome level.

      References

      Brenker C, Rehfeld A, Schiffer C, Kierzek M, Kaupp UB, Skakkebæk NE, Strünker T. 2018. Synergistic activation of CatSper Ca2+ channels in human sperm by oviductal ligands and endocrine disrupting chemicals. Hum Reprod 33:1915–1923. doi:10.1093/humrep/dey275

      Gao P, Wang L, Yang N, Wen J, Zhao M, Su G, Zhang J, Weng D. 2020. Peroxisome proliferator-activated receptor gamma (PPARγ) activation and metabolism disturbance induced by bisphenol A and its replacement analog bisphenol S using in vitro macrophages and in vivo mouse models. Environ Int 134. doi:10.1016/J.ENVINT.2019.105328

      Ozgyin L, Erdos E, Bojcsuk D, Balint BL. 2015. Nuclear receptors in transgenerational epigenetic inheritance. Prog Biophys Mol Biol. doi:10.1016/j.pbiomolbio.2015.02.012

      Pelch KE, Li Y, Perera L, Thayer KA, Korach KS. 2019. Characterization of Estrogenic and Androgenic Activities for Bisphenol A-like Chemicals (BPs): In Vitro Estrogen and Androgen Receptors Transcriptional Activation, Gene Regulation, and Binding Profiles. Toxicol Sci 172:23–37. doi:10.1093/TOXSCI/KFZ173

      Sharma S, Ahmad S, Khan MF, Parvez S, Raisuddin S. 2018. In silico molecular interaction of bisphenol analogues with human nuclear receptors reveals their stronger affinity vs. classical bisphenol A. Toxicol Mech Methods 28:660–669. doi:10.1080/15376516.2018.1491663

      Song K-H, Lee K, Choi H-S. 2011. Endocrine Disrupter Bisphenol A Induces Orphan Nuclear Receptor Nur77 Gene Expression and Steroidogenesis in Mouse Testicular Leydig Cells. Endocrinology 143:2208–2215. doi:10.1210/endo.143.6.8847

      Thomas P, Dong J. 2006. Binding and activation of the seven-transmembrane estrogen receptor GPR30 by environmental estrogens: A potential novel mechanism of endocrine disruption. J Steroid Biochem Mol Biol 102:175–179. doi:10.1016/j.jsbmb.2006.09.017

      Yang Z, Wang L, Yang Y, Pang X, Sun Y, Liang Y, Cao H. 2024. Screening of the Antagonistic Activity of Potential Bisphenol A Alternatives toward the Androgen Receptor Using Machine Learning and Molecular Dynamics Simulation. Environ Sci Technol 58:2817–2829. doi:10.1021/ACS.EST.3C09779/ASSET/IMAGES/LARGE/ES3C09779_0004.JPEG

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Strengths:

      The authors have generated a novel transgenic mouse line to specifically label mature differentiated oligodendrocytes, which is very useful for tracing the final destiny of mature myelinating oligodendrocytes. Also, the authors carefully compared the distribution of three progenitor cre mouse lines and suggested that Gsh-cre also labeled dorsal OLs, contrary to the previous suggestion that it only marks LGE-derived OPCs. In addition, the author also analyzed the relative contributions of OLs derived from three distinct progenitor domains in other forebrain regions (e.g. Pir, ac). Finally, the new transgenic mouse lines and established multiple combinatorial genetic models will facilitate future investigations of the developmental origins of distinct OL populations and their functional and molecular heterogeneity.

      Weaknesses:

      Since OpalinP2A-Flpo-T2A-tTA2 only labels mature oligodendrocytes but not OPCs, the authors can not suggest that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation (line 118-9). It remains possible that LGE/CGE-derived OPCs migrate into the cortex but are later eliminated.

      We are glad that the reviewer appreciates our work and are grateful for the positive comments and the constructive suggestion. We agree with the reviewer that our methodology by itself cannot suggest whether the lack of LGE/CGE-derived-OLs in the neocortex is caused by competitive postnatal elimination or not. That is why we cited a parallel work by Li et al. (ref [17] in the original manuscript; ref [19] in the revised manuscript), in which in utero electroporation (IUE) failed to label LGE-derived OL lineage cells in both embryonic and early postnatal brains. Although they did not directly explore CGE using IUE, their fate mapping results using Emx1-Cre; Nkx2.1-Cre; H2B-GFP at P0 and P10 revealed very low percentage of LGE/CGE-derived OL lineage cells. The lack of adult labeling in our study together with the lack of developmental labeling in the other study prompted us to hypothesize that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation. In the revised manuscript, we have expanded the discussion to explain this point more clearly.

      Reviewer #2 (Public Review):

      [...] Strengths:

      The strength and novelty of the manuscript lies in the elegant tools generated and used and which have the potential to elegantly and accurately resolve the issue of the contribution of different progenitor zones to telencephalic regions.

      We are glad that the reviewer appreciates our work and are grateful for the overall positive comments.

      Weaknesses:

      (1) Throughout the manuscript (with one exception, lines 76-78), the authors quantified OL densities instead of contributions to the total OL population (as a % of ASPA for example). This means that the reader is left with only a rough estimation of the different contributions.

      We thank the reviewer for this constructive suggestion. We have replaced the density quantification (Figure 2F and 3D in the original manuscript) with contributions to the total OL population (% of ASPA) (Figure 2J and 2N in the revised manuscript).

      (2) All images and quantifications have been confined to one level of the cortex and the potential of the MGE and the LGE/CGE to produce oligodendrocytes for more anterior and more posterior cortical regions remains unexplored.

      The quantifications were not confined to one level of the cortex but were performed in brain sections ranging from Bregma +1.94 to -2.80 mm, as shown in Supplementary Figure 2A-B in the original manuscript. We apologize for not having stated and presented this information clearly enough, and for the confusions it may have caused. In the revised manuscript, we have added relevant descriptions in the “Material and Methods” section (line 199-200*) and schematics along with representative images of more anterior and more posterior cortical regions (Supplementary Figure 2A-D).

      (3) Hence, the statement that "In summary, our findings significantly revised the canonical model of forebrain OL origins (Figure 4A) and provided a new and more comprehensive view (Figure 4B )." (lines 111, 112) is not really accurate as the findings are neither new nor comprehensive. Published manuscripts have already shown that (a) cortical OLs are mostly generated from the cortex [Tripathi et al 2011 (https://doi.org/10.1523/JNEUROSCI.6474-10.2011), Winker et al 2018 (https://doi.org/10.1523/JNEUROSCI.3392-17.2018) and Li et al (https://doi.org/10.1101/2023.12.01.569674)] and (b) MGE-derived OLs persist in the cortex [Orduz et al 2019 (https://doi.org/10.1038/s41467-019-11904-4) and Li et al 2024 (https://doi.org/10.1101/2023.12.01.569674)]. Extending the current study to different rostro-caudal regions of the cortex would greatly improve the manuscript.

      As explained in the response to comment (2), our original quantifications included different rostro-caudal regions of the cortex. In the revised manuscript, we have added more schematics and representative images in the Supplementary Figure 2 for better illustration to resolve the concern of comprehensiveness.

      We thank the reviewer for listing and summarizing highly relevant published researches along with the parallel study by Li et al. submitted to eLife. We apologize for the omission of the first two references in our original manuscripts and have cited them in appropriate places (ref [10] and ref [11] in the revised manuscript). However, we believe these works do not compromise the novelty and significance of our work for the following reasons:

      (1) Tripathi et al. 2011 (ref [10] in the revised manuscript) analyzed OL lineage cells in the corpus callosum and the spinal cord, but not in the cortex and anterior commissure. Their analysis was performed in juvenile mice (P12/13), not in adulthood. Most importantly, their analysis of ventrally derived OL lineage cells relied on lineage tracing using Gsh2Cre, which in fact also label OLs derived from Gsh2+ dorsal progenitors. In contrast, we analyzed mature OLs in the cortex, corpus callosum and anterior commissure in 2-month-old adult mice. We used intersectional and subtractive strategy to label OLs derived from dorsal, LGE/CGE and MGE/POA origins. Our strategy differentiated the two different ventral lineages (LGE/CGE vs. MGE/POA) and avoided mixed labeling of OLs from ventral and dorsal Gsh2+ progenitors.

      (2) Winkler et al. 2018 (ref [11] in the revised manuscript) analyzed OLs derived from dorsal progenitors but only quantified those in the gray matter and the white matter of somatosensory cortex. Their quantification relied on co-staining with Olig2/Sox10, and thereby included both oligodendrocyte precursors (OPCs) and OLs. In contrast, we analyzed mature OLs from three origins and quantified not only neocortical regions (Mo and SS) but also an archicortical region (Pir). Our analysis revealed that although dorsally derived OLs dominate neocortex, ventrally derived OLs, especially the LGE/CGE-derived ones, dominate piriform cortex.

      (3) Orduz et al. 2019 (ref [7] in the original manuscript and the revised manuscript) mainly focused on POA-derived OLs in the somatosensory cortex. Although they performed limited analysis on MGE/POA-derived OPCs at postnatal day 10 and 19, no quantification of MGE/POA-derived OLs was performed in terms of their density, contribution to the total OL population and spatial distribution in the cortex. In contrast, we performed systematic quantification on these aspects to demonstrate that MGE/POA-derived OLs make small but sustained contribution to cortex with a distribution pattern distinctive from those derived from the dorsal origin.

      (4) Li et al. 2024 (ref [17] in the original manuscript and [19] in the revised manuscript) is a parallel study submitted to eLife. Their and our independent discoveries nicely complemented each other. Using different sets of techniques and experiments but some shared genetic mouse models, we both found that LGE/CGE made minimum contribution to neocortical OLs. Their analysis in the prenatal and early postnatal stages together with our analysis in the adult brain painted a more comprehensive picture of cortical oligodendrogenesis. The uniqueness of our work is that we performed systematic quantification of all three origins and uncovered the differential contributions to neocortex, piriform cortex, corpus callosum and anterior commissure.

      In summary, our work developed novel strategies to faithfully trace OLs from the three different origins and performed systematic analysis in the adult brain. Our data uncovered their differential contributions to neocortex, piriform cortex and the two commissural white matter tracts, which significantly differ not only from the canonical view but also from other previous studies in aspects discussed above. We believe our discoveries did significantly revise the canonical model of forebrain OL origins and provided a new and more comprehensive view.

      Reviewer #3 (Public Review):

      [...] Intriguingly, by using an indirect subtraction approach, they hypothesize that both Emx1-negative and Nkx2.1-negative cells represent the progenitors from lateral/caudal ganglionic eminences (LC), and conclude that neocortical OLs are not derived from the LC region.The authors claim that Gsh2 is not exclusive to progenitor cells in the LC region (PMID: 32234482). However, Gsh2 exhibits high enrichment in the LC during early embryonic development. The presence of a small population of Gsh2-positive cells in the late embryonic cortex could originate/migrate from Gsh2-positive cells in the LC at earlier stages (PMID: 32234482). Consequently, the possibility that cortical OLs derived from Gsh2+ progenitors in LC could not be conclusively ruled out. Notably, a population of OLs migrating from the ventral to the dorsal cortical region was detected after eliminating dorsal progenitor-derived OLs (PMID: 16436615).

      The indirect subtraction data for LC progenitors drawn from the OpalinFlp-tdTOM reporter in Emx1-negative and Nkx2.1-negative cells in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line present some caveats that could influence their conclusion. The extent of activity from the two Cre lines in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mice remains uncertain. The OpalinFlp-tdTOM expression could occur in the presence of either Emx1Cre or Nkx2.1Cre, raising questions about the contribution of the individual Cre lines. To clarify, the authors should compare the tdTOM expression from each individual Cre line, OpalinFlp::Emx1Cre::RC::FLTG or OpalinFlp::Nkx2.1Cre::RC::FLTG, with the combined OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line. This comparison is crucial as the results from the combined Cre lines could appear similar to only one Cre line active.

      Overall, the authors provided intriguing findings regarding the origin and fate of oligodendrocytes from different progenitor cells in embryonic brain regions. However, further analysis is necessary to substantiate their conclusion about the fate of LC-derived OLs convincingly.

      We thank the reviewer for these thoughtful comments. We agree with the reviewer that the presence of Gsh2-positive cells in the late embryonic cortex by itself could not rule out the possibility that they originate/migrate from Gsh2-positive cells in the LC at earlier stages. Staining dorsal-lineage intermediate progenitors with Gsh2, or performing intersectional lineage tracing using Gsh2Cre along with a dorsal-specific Flp driver, would provide more direct evidence on this issue. Nonetheless, as our lineage tracing of LGE/CGE-derive OLs did not employ Gsh2Cre, the doubt on the identity of Gsh2+ cortical progenitors should not affect the interpretation of our data.

      Regarding the subtractional LCOL labeling strategy used in our study, we wonder if there was any misunderstanding by the reviewer. As stated in our manuscript (line 59-61) and reiterated by the reviewer, OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG labels OLs derived from progenitors that express neither Emx1Cre nor Nkx2.1Cre. As these two progenitor pools do not overlap with each other, there is a purely additive effect of their actions. If there is any concern about efficiency and specificity, it would be non-adequate Cre-mediated recombinations that lead to mislabeling of dOLs or MPOLs as LCOLs (i.e., OLs derived from Emx1 or Nkx2.1-expressing progenitors were not successfully “subtracted” and thereby “wrongly” retained RFP expression). Therefore, the bona-fide LGE/CGE-derive OLs would only be fewer but not more than RFP+ LCOLs labeled by our subtractional strategy, even if any of the Cre lines did not work efficiently enough. In any case, this would not affect our conclusion that LGE/CGE-derive OLs make a minimal contribution to neocortex, as the “ground truth” contribution by LGE/CGE could only be less but not more than what we have observed using the current strategy.

      In support of our conclusion, a parallel study by Li et al. 2024 (ref [17] in the original manuscript; ref [19] in the revised manuscript) also provided independent experimental evidence that “any contribution of oligodendrocyte precursors to the developing cortex from the lateral ganglionic eminence is minimal in scope (quoted from its eLife assessment).” In addition, in their revision, they performed Gsh2 immunostaining in P0 Emx1Cre::HG-loxP mouse and found nearly all Gsh2+ cells in the cortical SVZ were derived from the Emx1+ lineage. We are glad that this additional piece of evidence further clarified the case, but still want to emphasize that the subtractional strategy we took was designed purposefully to avoid the potential uncertainty of Gsh2Cre and to more faithfully label LGE/CGE-derived OLs. Therefore, the validity of our conclusion about the fate of LC-derived OLs should be independent from the question on the identity of Gsh2+ cortical progenitors and stands well by itself.

      We hope that these explanations have adequately addressed the reviewer’s concerns. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In Figures 2C, 2D, 2E and 3D, the authors should provide counts of labelled cells as a % of ASPA+ cells. This will give an accurate picture of the contribution of the different progenitor regions to OLs.

      The graphs in Figure 2F are unnecessary since they are simply repeats of C-E but re-arranged.

      We thank the reviewer for the valuable suggestions. These two recommendations are sort of related, and thereby we made the following changes. We replaced the density quantification in Figure 2F and 3D with % of ASPA (Figure 2J and 2N in the revised manuscript) to give an accurate picture of the contribution of the different progenitor regions to OLs, as suggested by the reviewer. We still retained the density counts in Figure 2C-E (Figure 2G-I in the revised manuscript). Together with quantifications of rotral-caudal and larminar distributions presented in Supplementary Figure 2, these data demonstrated that OLs from differential origins display distinct spatial distribution patterns.

      At what ages were the quantifications performed in all the figures?

      We apologize for the omission of this information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section of the revised manuscript.

      In 2D, and 3B the GFP should have been activated but the authors do not show it or quantify it presumably because GFP would flood the sections in the presence of Emx1Cre. Nevertheless, since eGFP is shown in the diagram in 2B, the authors should mention why they chose not to show it.

      We thank the reviewer for the helpful comment and the suggestion. We have modified the schematic in Figure 2B and added explanation in the figure legend (line 308-313). We also added a schematic in Supplementary Figure 1A along with images of GFP channel in Supplementary Figure 1D (line 338-350).

      All the main figures and supplementary figures are too small to see properly.

      We are sorry that there was severe compression of images in the combined manuscript file at the conversion step during the initial submission. We apologize for the compromised image quality and have re-uploaded full-size figures as individual files on BioRxiv soon after receiving the reviews. For the revised manuscript, we also take care to upload full-size figures at high resolution as individual files to ensure their quality of presentation.

      Supplementary Figure 2E is unnecessary and perhaps misleading the reader that cortical-derived OLs have a preference for the lower layers whereas the distribution may simply reflect the distribution of OLs in the cortex.

      We thank the reviewer for the helpful comment and the suggestion. We have removed this panel and replaced it with quantifications of relative laminar distributions of the total (ASPA+) OLs along with those from the three different origins (Supplementary Figure 2G in the revised manuscript). Indeed, the preference for the lower layers of dorsally-derived OLs mirrored the distribution of total OLs in the cortex, while the MGE/POA-derived OLs deviate significantly from others and exhibit higher preference towards layer 4.

      Quantification of labelled cells as a % of ASPA should also be performed in Supplementary Figure 3.

      We thank the reviewer for this suggestion. In the revised manuscript, we have included quantifications of labelled cells as % of ASPA for both OpalinFlp::Emx1Cre::Ai65 and  OpalinFlp::Nkx2.1Cre::Ai65 (Figure 2J and N). The sum of the these two data sets will be equivalent to those of OpalinFlp::Emx1Cre::Nkx2.1Cre::Ai65 shown in Supplementary Figure 3, and thereby we did not perform additional quantifications to avoid redundant efforts.

      Imaging and quantification should be extended to more posterior regions of the cortex to find out whether the contribution is different from the areas already examined.

      We thank the reviewer for the suggestion on imaging and apologize for the confusion about the range of quantification. As explained in the response to comment (2) of weakness, the quantifications were not confined to one level of the cortex but were performed in brain sections ranging from Bregma +1.94 to -2.80 mm, as shown in Supplementary Figure 2A-B in the original manuscript. In the revised manuscript, we have added relevant descriptions in the “Material and Methods” section (line 199-200) and schematics along with representative images of more anterior and more posterior cortical regions (Supplementary Figure 2A-D).

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors should provide Opalin reporter expression data across various brain regions at different developmental stages to clarify the expression pattern of the reporter.

      We appreciate the reviewer’s comment. We chose to performed all quantifications in adult mice as Opalin is a well-established marker for differentiated OLs and the recombinase-dependent reporter expression is accumulative and irreversible. If there is any non-specific labeling in any earlier developmental stage, it would be retained and manifested at the timepoint we examined as well. In another word, the fact that we did not detect any non-specific labeling in the current dataset but only confined labeling in mature OLs ensured that no non-OL labeling was present in earlier timepoint. As shown in Figure 1D-F, reporter expression activated by the Opalin driver is presented at high OL specificity in all analyzed brain regions. This is further corroborated by results from combinatorically labeled samples (Figure 2 and Supplementary Figure 2), in which only OLs but not any other cell types were labeled in all analyzed brain regions too. Following the reviewers’ suggestions, we have added representative images of more rostral and more caudal cortical regions (Supplementary Figure 2B-D), which also showed highly specific OL labeling.  

      (2) In Figure 1D, please specify the developmental stage of the mice used for staining.

      We apologize for the omission of this information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section (line 199-200) of the revised manuscript.

      (3) The authors should clarify if the Opalin reporter expressed in OPCs and astrocytes at developmental stages of mice, such as P0, P7, and P30.

      We appreciate the reviewer’s comment, but as explained in response to comment (1), Opalin is a well-established marker for differentiated OLs which is not expressed in OPCs or astrocytes. As shown in Figure 1D-E, reporter expression is confined to CC1+ differentiated OLs with no colocalization with Sox9 (astrocyte marker). In support with this observation, only ASPA+ differentiated OLs but no OPC or astrocyte were labeled in any of the combinatorial lineage tracing samples generated using this line combined with progenitor-Cre lines. In addition to marker staining, we also did not observe any RFP+ cells with OPC or astrocyte morphology. As the recombinase-dependent reporter expression is accumulative and irreversible, the fact no non-specific labeling was observed in adult brain retrospectively proved the specificity of Oplain-Flp in earlier developmental stages.

      (4) In Figure 1E, authors should address why the efficiency of the tdTomato line is notably lower compared to that of H2B-GFP and whether the stability of reporters could impact the conclusions drawn.

      The difference in reporting efficiency is mainly caused by differences inherent to the two reporting systems. The TRE-RFP reporter is derived from Ai62, composed of a Tet response element and tdTomato inserted into the T1 TIGRE locus. The tdTomato expression is driven by tTA-TRE transcriptional activation. The HG-loxP reporter is derived from HG-Dual, composed of a CAG promoter, a frt-flanked STOP cassette, and H2B-GFP inserted into the Rosa26 locus. The H2B-GFP expression is driven by CAG promoter after Flp-mediated removal of the STOP cassette. A Flp-dependent tdTomato reporter designed in the same way as the HG-FRT reporter would have similar efficiency. In fact, the RC::FLTG reporter can be viewed as such a reporter in the absence of Cre, which did show similarly high efficiency as HG-FRT and supported efficient subtractive labeling of LGE/CGE-derived OLs. We apologize for a typo in the title of the Y-axis of the right panel in the original Figure 1F which may have caused potential misunderstanding. The “RFP+CC1+/CC1” should be “XFP+CC1/CC1”. We have corrected this mistake and revised the figure legend for clearer description of the data (Line 293-302 in the revised manuscript).

      (5) In Figure 2, please clarify the developmental stage of the mice used for staining. Authors should present the eGFP image in addition to tdTOM.

      We apologize for the omission of the age information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section (line 199-200) of the revised manuscript. We thank the reviewer for the suggestion on eGFP image and have presented it in supplementary Figure 1 in the revised manuscript.

      (6) in Figure 2D, authors should display the eGFP image alongside the tdTomato image. It is difficult to assess the efficiency of Emx-Cre and Nkx2.1-Cre.

      We thank the reviewer for the suggestion on eGFP image and have presented eGFP image in Supplementary Figure 1D in the revised manuscript. There are two reasons why we chose to present it in the supplementary figure instead of main figure. First, we added ASPA staining in the green channel along with quantifications of RFP cells as % of ASPA in Figure 2 in the revised manuscript, following reviewer #2’s suggestion. Second, as pointed out by reviewer #2, GFP would flood the sections in the presence of Emx1Cre and could be quite distractive if it was shown together with RFP.

      We were not entirely sure what exactly the reviewer means by “assess the efficiency of Emx-Cre and Nkx2.1-Cre”, but we believe that the quantifications of RFP cells as % of ASPA clarified the contribution of each origin to the total OLs (Figure 2J and 2N in the revised manuscript).

      (7) Figure 3 depicts the entire brain, replicating the image presented in Figure 2. It would be beneficial to consolidate Figures 2 and 3, as they showcase identical brain scans of different regions.

      We thank the reviewer for the constructive suggestion and have consolidated Figures 2 and 3 in the original manuscript into Figure 2 in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review): 

      [...] Strengths: 

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single-cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single-cell RNA-seq. 

      Weaknesses: 

      The manuscript is written in a very compressed style and many technical details of the evaluations conducted are unclear and processed data has not been made available for evaluation, limiting the ability of the reader to independently judge the merits of the method. 

      Thank you for your thoughtful and constructive review of our manuscript. We appreciate your recognition of the strengths of our work and the potential impact of our modified PETRI-seq protocol on the field of bacterial single-cell RNA-seq. We are grateful for the opportunity to address your concerns and improve the clarity and accessibility of our manuscript.

      We acknowledge your feedback regarding the compressed writing style and lack of technical details,which are constrained by the requirements of the Short Report format in eLife. We will addresse these issues in our revised manuscript as follows:

      (1) Expanded methodology section: We will provide a more comprehensive description of our experimental procedures, including detailed protocols for the ribosomal depletion step and data analysis pipeline. This will enable readers to better understand and potentially replicate our methods.

      (2) Clarification of technical evaluations: We will elaborate on the specifics of our evaluations, including the criteria used for assessing the efficiency of ribosomal depletion and the methods employed for identifying and characterizing subpopulations within the E. coli biofilm model.

      (3) Data availability: We apologize for the oversight in not making our processed data readily available. We have deposited all relevant datasets, including raw and source data, in appropriate public repositories (GEO number: GSE260458) and provide clear instructions for accessing this data in the revised manuscript.

      (4) Supplementary information: To maintain the concise nature of the main text while providing necessary details, we will inculde additional supplementary information. This will cover extended methodology, detailed statistical analyses, and comprehensive data tables to support our findings.

      (5) Discussion of limitations: We will include a more thorough discussion of the potential limitations of our modified protocol and areas for future improvement.

      ​We believe these changes will significantly improve the clarity and reproducibility of our work, allowing readers to better evaluate the merits of our method.

      Reviewer #2 (Public Review): 

      [...] Strengths: 

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. Given that PdeI is a phosphodiesterase, which is supposed to promote hydrolysis of c-di-GMP, this finding is unexpected. 

      Weaknesses: 

      With the descriptions and writing of the manuscript, it is hard to place the findings about the PdeI into existing context (i.e. it is well known that c-di-GMP is involved in biofilm development and is heterogeneously distributed in several species' biofilms; it is also known that E.coli diesterases regulate this second messenger, i.e. https://journals.asm.org/doi/full/10.1128/jb.00604-15). <br /> There is also no explanation for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels. Perhaps the examination of the rest of the genes in cluster 2 of the biofilm sample could be useful to explain the observed association. 

      Thank you for your thoughtful and constructive review of our manuscript. We are pleased that the reviewer recognizes the value and efficiency of our rRNA depletion method for PETRI-seq, as well as its potential impact on the field. We would like to address the points raised by the reviewer and provide additional context and clarification regarding the function of PdeI in c-di-GMP regulation.

      We acknowledge that c-di-GMP’s role in biofilm development and its heterogeneous distribution in bacterial biofilms are well studied. We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI was predicted to be a phosphodiesterase responsible for c-di-GMP degradation. This prediction is based on sequence analysis where PdeI contains an intact EAL domain known for degrading c-di-GMP. However, it is noteworthy that PdeI also contains a divergent GGDEF domain, which is typically associated with c-di-GMP synthesis. This dual-domain architecture suggests a potential for complex regulatory roles. As reported, the knockout of the major phosphodiesterase PdeH in E. coli leads to the accumulation of c-di-GMP. Further, a point mutation on PdeI's divergent GGDEF domain (G412S) in this PdeH knockout strain resulted in decreased c-di-GMP levels, implying that the wild-type GGDEF domain in PdeI has a role in maintaining or increasing c-di-GMP levels in the cell. Additionally, PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results demonstrating that PdeI is a membrane-associated protein, we predict that PdeI functions as a sensor that integrates environmental signals with c-di-GMP production under complex regulatory mechanisms. The experimental evidence, along with domain analysis, suggests that PdeI could contribute to c-di-GMP synthesis, rebutting the notion that it solely functions as a phosphodiesterase. Furthermore, our single-cell experiments showed a positive correlation between PdeI expression levels and c-di-GMP levels (Fig. 2J). HPLC LC-MS/MS analysis further confirmed that PdeI overexpression (induced by arabinose) upregulated c-di-GMP levels (Fig. 2K). Importantly, in our HPLC LC-MS/MS analysis, we compared the PdeI overexpression strain with the wild-type MG1655 strain, thereby excluding the influence of other genes in cluster 2. In summary, while PdeI is predicted to be a phosphodiesterase based on its sequence and the presence of an EAL domain, the additional presence of a divergent GGDEF domain and experimental evidence suggests that PdeI has a function in upregulating c-di-GMP levels. These findings support the hypothesis that PdeI may have both synthetic and regulatory roles in c-di-GMP metabolism.

    1. Author response

      Reviewer #1 (Public Review):

      […] Weaknesses:

      This work explores an interesting question on regulating myoD+ progenitors and the defects of this process in skeletal muscle differentiation by SRFS2 but spreads out in many directions rather than focusing on the key defects. A number of approaches are used, but they lack the robust mechanistic analysis of the defects that result in muscle differentiation. Specifically, the role of SRFS2 on splicing appears to be a misfit here and does not explain the primary defects in the migration of myoD+ progenitors. There are concerns about the scRNA-seq and many transcripts in muscle biology that are not expressed in muscle cells. Focusing on main defects and additional experimental evidence to clear the fusion vs. precocious differentiation vs. reduced differentiation will strengthen this work.

      (1) The analysis of RNA-seq data (Figure 2) is limited, and it is unclear how it relates to the work presented in this MS. The Go enrichment analysis is combined for both up and down-regulated DEG, thus making it difficult to understand the impact differently in both directions. Stac2 is a predominant neuronal isoform (while Stac3 is the muscle), and the Symm gene is not found in the HGNC or other databases. Could the authors provide the approved name for this gene? The premise of this work is based on defects in ECM processes resulting in the mis-targeting of the muscle progenitors to the nonmuscle regions. Which ECM proteins are differentially expressed?

      The GO enrichment analysis (Figure 2B) indicates that genes involved in skeletal muscle construction and function were significantly dysregulated, with both up-regulated and down-regulated genes observed, consistent with the phenotype analysis presented in Figure 1.

      We agree with the reviewer’s comments that Stac3 is the predominant muscle isoform with high expression in skeletal muscle tissues, while stac2 is expressed at low levels in these tissues. Therefore, we decided to delete the Stac2 data from the Figure 2C and will modify the text accordingly. We apologize for our errors.

      In response to the reviewer's comment regarding the Symm gene not being found in the HGNC or other databases, we carefully re-examined the genes presented in Figure 2C. We discovered that one of the genes is actually Synm, which encodes synemin, an intermediate filament protein. We will correct this in the manuscript.

      scRNA-seq analysis revealed defects in ECM processes in SRSF2-deficient myoblasts, which we believe likely resulted in the mis-targeting of muscle progenitors to non-muscle regions. However, comparing RNA-seq results from whole muscle tissues with scRNA-seq results is challenging.

      (2) Could authors quantify the muscle progenitors dispersed in nonmuscle regions before their differentiation? Which nonmuscle tissues MyoD+ progenitors are seen? Most of the tDT staining in the enlarged sections appears to be punctate without any nuclear staining seen in these cells (Figure 3 B, D E-F). Could authors provide high-resolution images? Also, in the diaphragm cross-sections in mutants, tdT labeling appears to be missing in some areas within the myofibers defined as cavities by the authors (marked by white arrows, Figure 3H). Could this polarized localization of tDT be contributing to specific defects?

      tdT staining revealed a substantial presence of MyoD-derived cells distributed beyond the muscle regions, as shown in Figure 3B. Quantify the number of MyoD+ progenitors dispersed in non-muscle regions is not meaningful.

      tdT+ cells also include those that previously expressed MyoD but have since differentiated into myotubes and myofibers, which is why many tdT+ staining is not nuclear.

      MyoD+ cells deficient in SRSF2 either undergo apoptosis or premature differentiation. Consequently, tdT staining in SRSF2-KO muscles showed many irregularities in the muscle fibers.

      (3) Is there a difference in the levels of tDT in the myoD" muscle progenitors that are mis-targeted vs the others that are present in the muscle tissues?

      tdT+ cells include those that previously expressed MyoD but have since differentiated into myotubes and myofibers, which are no longer MyoD+ cells. Additionally, tdT+ also include those currently expressing MyoD, which are MyoD+ cells.

      The fiber differences between WT and SRSF2-KO mice are easily discernible through tdT staining (Figure 2D and 3D), however, comparing the levels of tdT staining between the two groups is not meaningful.

      (4) scRNA is unsuitable for myotubes and myofibers due to their size exclusion from microfluidics. Could authors explain the basis for scRNA-seq vs SnRNA-seq in this work? How are SKM defined in scRNA-data in Figure 4? As the myofibers are small in KO, could the increased level of late differentiation markers be due to the enrichment of these small myotubes/myofibers in scRNA? A different approach, such as ISH/IF with the myogenic markers at E9.5-10.5, may be able to resolve if these markers are prematurely induced.

      SRSF2 is highly expressed in proliferative myoblasts, but its levels declined once differentiation begins. In our study, we used Myod1-Cre to delete the SRSF2 gene and performed the scRNA-seq analysis to examine the effects of SRSF2 deletion on the proliferation and differentiation of MyoD cells. Our analysis revealed that SRSF2 deletion caused proliferation defects and premature differentiation of MyoD cells (Figure 5G), leading to myofiber abnormalities.

      We determined that snRNA-seq analysis is not suitable for our study.

      Additionally, skeletal muscle cells (SKM) were defined based on the expression of skeletal muscle markers, as shown in Figure 4C.

      (5) TNC is a marker for tenocytes and is absent in skeletal muscle cells. The authors mentioned a downregulation of TNC in the KO SKM derived clusters. This suggests a contamination of the tenocytes in the control cells. In spite of the downregulation of multiple ECM genes showed by scRNA-seq data, the ECM staining by laminin in KO in Figure 3 appears to be similar to controls.

      Tenascin-C (Tnc) is also part of the extracellular matrix (ECM) family. scRNA-seq analysis revealed that multiple ECM genes were downregulated in SRSF2-KO myoblasts, however, this did not indicate that laminin was downregulated in the SRSF2-KO muscles.

      (6) The expression of many fusion genes, such as myomaker and myomerger, is reduced in KO, suggesting a primary fusion defect vs a primary differentiation defect. Many mature myofiber proteins exhibit an increased expression in disease states, suggesting them as a compensatory mechanism. Authors need to provide additional experimental evidence supporting precocious differentiation as the primary defect.

      Our analysis revealed that the deletion of SRSF2 caused premature differentiation of MyoD cells (Figure 5G), leading to abnormalities of myofiber formation. SRSF2 is highly expressed in proliferative myoblasts, but its expression declines quickly in myotubes. Therefore, it is unlikely that the low expression of SRSF2 in myotubes caused the primary fusion defect.

      (7) The fusion defects in KO are also evident in siRNA knockdown for SRSF2 and Aurka in C2C12, which mostly exhibits mononucleated myocytes in knockdowns. Also, a fusion index needs to be provided.

      SRSF2 knockdown and Aurka knockdown caused differentiation defects, including fusion defects. We quantified the percentages of both MyoG+ and MHC+ cells in the differentiation assay.

      (8) The last section of the role of SRSF2 on splicing appears to be a misfit in this study. Authors describe the Bin1 isoforms in centronuclear myopathy, but exon17 is not involved in myopathy. Is exon17 exclusion seen in other diseases/ splicing studies?

      Our study is the first to report that exon 17 inclusion of Bin1 is regulated by SRSF2. Specifically, the knockdown of Bin1 exon 17 caused severe differentiation defects in C2C12 myoblasts. The involvement of Bin1 exon 17 in myopathy requires further validation using clinical samples.

      Reviewer #2 (Public Review):

      […] Weaknesses: Although unbiased sequencing methods were used, their findings about SRSF2 served as a transcriptional regulator and functioned in alternative splicing events are not novel. The introductions and discussion is not clearly written. The authors did not raise clear scientific questions in the introduction part. The last paragraph is only copy-paste of the abstract. The discussion part is mainly the repeat of their results without clear discussion.

      While the role of SRSF2 as a transcriptional regulator involved in alternative splicing events is not novel, the specific SRSF2-regulated alternative splicing events and targeted genes in skeletal muscle have not been reported in other publications. We believe our interpretation of the data and comparison with related published studies are well presented in the Discussion section.

    1. Author response:

      Answers to Reviewer #1 (Public Review):

      (1) Tonic and phasic components in Figure 1 are not clear.

      We will reformulate Figure 1A to show how the tonic and phasic components were measured. As this point was also raised by Reviewer #2 (Comment 3), we will explicitly clarify this in the Methods section. We will modify the color scheme to improve clarity.

      (2) Labeling of traces in Figure 4.

      We will add labels to traces informing which sensory pathways were stimulated to produce each response.

      (3) Optic tectum instead of optical tectum.

      We apologize for the error. We will replace “optical tectum” with “optic tectum” as also suggested by Reviewer #2.

      Answers to Reviewer #2 (Public Review):<br /> (1) Complexity of tectum upstream circuitry (Comments 1 and 2).

      Processing of visual information is certainly a major role of the tectum, but it is true that it also receives sensory inputs from other structures including sensory pathways. We will acknowledge this complexity in our revised manuscript along with suggestions for heading titles.

      (2) Figure 1 and associated text. 

      As mentioned in the provisional answer point 1 to Reviewer #1, we will reformulate Figure 1A and clarify how tonic and phasic responses were calculated.

      (3) Figure 3 and associated text.

      We will perform the analysis suggested by the reviewer and move calculations to the Methods section as requested.

      (4) Figure 5C and lines 398-410.

      We will consider omitting Figure 5C or clearly stating its value in the context of the rest of the data and our previous behavioral experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors explore mechanisms through which T-regs attenuate acute pain using a heat sensitivity paradigm. Analysis of available transcriptomic data revealed expression on the proenkephalin (Penk) gene in T-regs. The authors explore the contribution of T-reg Penk in the resolution of heat sensitivity.

      Strengths:

      Investigating the potential role of T-reg Penk in the resolution of acute pain is a strength.

      Weaknesses:

      The overall experimental design is superficial and lacks sufficient rigor to draw any meaningful conclusions.

      We hope that the reviewer will reconsider this severe criticism after examining the updated manuscript and results.

      For instance:

      (1) The were no TAM controls. What is the evidence that TAM does not alter heat-sensitive receptors.

      the impact of TMX on heat perception is not the object of this study. Nevertheless, it appears that heat-sensitivity in controls WT (blue dots) is slightly diminished after TMX administration (Figure 5A), suggesting that heat-sensitive receptors are moderately altered by TMX per se. This reduction is much more pronounced for LOX mice. Thus, although it is possible that TMX play a marginal role on heat sensitivity by itself, the results show a much more pronounced effect of TMX in LOX than in WT, in favor of a role for Penk Treg in heat sensitivity.

      (2) There are no controls demonstrating that recombination actually occurred. How do the authors know a single dose of TAM is sufficient?

      these results are now presented in figure S4. A 70% reduction in Penk mRNA is observed in Treg after a single administration of TMX.

      (3) Why was only heat sensitivity assessed? The behavioral tests are inadequate to derive any meaningful conclusions. Further, why wasn't the behavioral data plotted longitudinally

      The longitudinal data are presented in figure S5A. New behavioral tests have been performed and the results are now shown in figure S5E-H. Importantly, heat sensitivity was observed in two independent laboratory with two different tests.

      Reviewer #2 (Public Review):

      Summary:

      The present study addresses the role of enkephalins, which are specifically expressed by regulatory T cells (Treg), in sensory perception in mice. The authors used a combination of transcriptomic databases available online to characterize the molecular signature of Treg. The proenkephalin gene Penk is among the most enriched transcripts, suggesting that Treg plays an analgesic role through the release of endogenous opioids. In addition, in silico analysis suggests that Penk is regulated by the TNFR superfamily; this being experimentally confirmed. Using flow cytometry analysis, the authors then show that Penk is mostly expressed in Treg of the skin and colon, compared to other immune cells. Finally, genetic conditional excision of Penk, selectively in Treg, results in heat hypersensitivity, as assessed by behavior analysis.

      Strengths:

      The manuscript is clear and reveals a previously unappreciated role of enkephalins, as released by immune cells, in sensory perception. The rationale in this manuscript is easy to follow, and conclusions are well supported by data.

      Weaknesses:

      The sensory deficit of Penk cKO appears to be quite limited compared to control littermates.

      Reviewer #3 (Public Review):

      Summary:

      Aubert et al investigated the role of PENK in regulatory T cells. Through the mining of publicly available transcriptome data, the authors confirmed that PENK expression is selectively enriched in regulatory but not conventional T cells. Further data mining suggested that OX40, 4-1BB as well as BATF, can regulate PENK expression in Tregs. The authors generated fate-mapping mice to confirm selective PENK expression in Tregs and activated effector T cells in the colon and spleen. Interestingly, transgenic mice with conditional deletion of PENK in Tregs resulted in hypersensitivity to heat, which the authors attributed to heat hyperalgesia.

      Strengths:

      The generation of transgenic mice with conditional deletion of PENK in foxp3 and PENK fate-mapping is novel and can potentially yield significant findings. The identification of upstream signals that regulate PENK is interesting but unlikely to be the main reason why PENK is predominantly expressed in Tregs as both BATF and TNFR are expressed in effector T cells.

      Weaknesses:

      There is a lack of direct evidence and detailed analysis of Tregs in the control and transgenic mice to support the authors' hypothesis. PENK was previously reported to be expressed in skin Tregs and play a significant role in regulating skin homeostasis: this should be considered as an alternative mechanism that may explain the changed sensitivity to heat observed in the paper.

      We now provide a detailed analysis of Treg with or without Penk, from their immunosuppressive functions to their colocalization with sensory neurons in the skin, supporting their function as natural analgesics. The alternate hypothesis relative to skin homeostasis is now clearly presented and discussed.

      Recommendations for the authors):

      Reviewer #2 (Recommendations For The Authors):

      Most of my comments should be addressable in a revised manuscript but will require additional analysis.

      Major:

      - According to flow cytometry analysis, Penk is expressed mostly in Treg of the skin and colon. What may account for such restricted expression? Where could Treg-released enkephalins act?

      We now rephrased the paper to emphasize the known role of Batf in tissue Treg differentiation. We believe the Batf dependency of Penk expression is the reason why tissue Treg are more enriched in Penk than Treg from lymphoid organs. This is now clearly discussed.

      We also provide a new figure (Figure S1) that shows that binding of Batf and co factors AP1 and IRF4 were reported to bind to Penk regulatory regions. Altogether, the role of Batf in tissue Treg differentiation would explain why tissue Treg such as colon and skin are particularly enriched in Penk. This is now clearly stated in the revised manuscript. 

      As to know where Treg-released enkephalins act, we performed immunostainings in the skin and observed that Treg could colocalize with sensory neurons (shown in a new figure 5, panel D). This observation raise the hypothesis that  Treg-released enkephalins could act on sensory neurons locally.

      - Which mechanism can underlie heat hypersensitivity in Penk cKO mice? Which sensory neurons are involved? Are other sensory modalities affected, such as mechanical sensitivity?

      As stated above, we show that Treg can be in close contact with thermal sensors neurons producing CGRP. These data are shown in figure 5D. We have also tested may other nociceptive stimulus (innocuous and noxious) and did not detect significant differences. These data are presented as a supplementary figure S5. Whether enkephalins produced by Treg can change the stimulation threshold of various nervous fibers is currently performed by electrophysiology.

      - No control is provided to ensure that Penk is selectively excised in Treg cells in cKO mice.

      We have performed additional experiments with fluorescent probes to document Penk mRNA expression in cKO mice. The results on the specific expression of Penk mRNA in various subsets post-TMX are shown in a supplementary figure S4.

      - The authors acknowledge that Penk from Treg was previously studied in an animal model of inflammatory pain. However, which role these endogenous opioids play is unclear, especially since authors discovered that enkephalins are likely continuously released at steady states. This is not enough discussed in the narrative, which surprisingly does not separate the results from the discussion.

      The results and discussion are now separated in two sections.

      Minors:

      - Replace "Fox3 1" with "Fox31" (line 31), "functions 15" with "functions15" (line 43), "BATF 19" with "BATF19" (line 85).

      - Text mentions Figure S4 (line 125), which is most likely S3.

      Reviewer #3 (Recommendations For The Authors):

      Given the most significant finding of this paper is based on the heat-induced pain model, there is surprisingly little analysis of Tregs in this context. The authors analyzed spleen and colon Tregs at steady state, it is unclear whether any of these Tregs are involved in pain sensitivity directly. Skin Tregs or other relevant Tregs to this model should be analyzed in control and Lox mice. This is particularly relevant as PENK expression was previously reported in skin Tregs and plays a significant role in skin homeostasis (Yamazaki et al 2020 PNAS). Does PENK conditional deletion alter Treg frequencies, numbers, and immune suppressive function? Not even spleen or colon Treg were analyzed comparing control and lox mice.

      We now provide evidences showing unaltered immunosuppressive functions of Treg in the absence of Penk (Figure 4), and more importantly unaffected proportions of skin Treg in mice lacking Penk in Treg, at the very site of heat stimulation (Figure 5B-C). We also observed unaffected representation of Treg in the spleen and lymph nodes, but we do not feel that these data are necessary to interpret the results.

      Given the role of PENK in skin Tregs, could the observed effect in Figure 4 be due to altered skin homeostasis rather than sensitivity to pain?

      The reviewer is referring to a paper where Penk in skin Treg play a role on UV-damaged keratinocytes in vivo (Shime et al., 2020, PNAS). To our knowledge, a role for Penk produced by skin Treg on keratinocytes homeostasis at the steady state is currently unknown. Nevertheless, this hypothesis is now clearly stated and discussed in the manuscript.

      The authors stated that only after 7 days post tamoxifen treatment was heat hyperalgesia observed: deletion of PENK in Treg but not Tconv should be confirmed: is deletion only complete after 7 days or is the effect observed due to indirect effects of altered "normal" Treg function?

      We have performed a kinetics to document Penk deletion at D3, D7 and 30 post-TMX. Results show a specific deletion of Penk in Treg at all time points so we combined all the time points for the representation of the results (Figure S4). As for the indirect effects of “altered” normal function, we now provide the reader with a new figure (Figure 4), showing that Penk deficient Treg are not impaired in their suppressive function in vitro and in vivo.  

      Actual protein/peptide production of enkephalins by Tregs should be confirmed. It is also unclear which peptide(s) can be secreted and presumably responsible for the changes in heat sensitivity.

      This is a very interesting question that we addressed with a MENK ELISA but without success at reproducing the results. An ongoing project will use mass spectrometry to fully characterize the peptides produced by Treg and activated Tconv.

      The analysis of PENK regulation by Tregs is interesting despite them being entirely based on data mining. BATF is a pioneering factor expressed by all activated effector T cells. While the connection between BATF and PENK may explain why the authors observed PENK expression chiefly in activated effectors and Tregs, BATF cannot be the reason why PENK is "predominantly" expressed by Tregs. Similarly, 4-1BB and OX40 can be induced on effector T cells. Is PENK under the control of Foxp3? There are lots of publically available datasets on Foxp3/IL-2 dependent Treg signatures through which this can be addressed.

      We now provide a supplementary figure (Figure S1), showing a compilation of ChIP Seq studies for various transcription factors in various T cell subsets. We provide the reader with a list of all the TF that have been reported to bind in the regulatory regions of Penk. In agreement with our hypothesis, BATF, FOXP3, IRF4 and several others are present in that list. Further work is needed to decipher the exact contribution of each of those TF to the regulation of Penk in Treg vs activated Tconv that is beyond the scope of this report.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work identified new NMD inhibitors and tested them for cancer treatment, based on the hypothesis that inhibiting NMD could lead to the production of cancer neoantigens from the stabilized mutant mRNAs, thereby enhancing the immune system's ability to recognize and kill cancer cells. Key points of the study include:

      • Development of an RNA-seq based method for NMD analysis using mixed isogenic cells that express WT or mutant transcripts of STAG2 and TP53 with engineered truncation mutations.

      • Application of this method for a drug screen and identified several potential NMD inhibitors.

      • Demonstration that one of the identified compounds, LY3023414, inhibits NMD by targeting the SMG1 protein kinase in the NMD pathway in cultured cells and mouse xenografts.

      • Due to the in vivo toxicity observed for LY3023414, the authors developed 11 new SMG1 inhibitors (KVS0001-KVS0011) based on the structures of the known SMG1 inhibitor SMG1i-11 and the SMG1 protein itself.

      • Among these, KVS0001 stood out for its high potency, excellent bioavailability, and low toxicity in mice. Treatment with KVS0001 caused NMD inhibition and increased presentation of neoantigens on MHC-I molecules, resulting in the clearance of cancer cells in vitro by co-cultured T cells and cancer xenografts in mice by the immune system.

      These findings support the strategy of targeting the NMD pathway for cancer treatment and provide new research tools and potential lead compounds for further exploration.

      Strengths:

      The RNA-seq-based NMD analysis, using isogenic cell lines with specific NMD-inducing mutations, represents a novel approach for the high-throughput identification of potential NMD modulators or genetic regulators. The effectiveness of this method is exemplified by the identification of a new activity of AKT1/mTOR inhibitor LY3023414 in inhibiting NMD.

      The properties of KVS0001 described in the manuscript as a novel SMG1 inhibitor suggest its potential as a lead compound for further testing the NMD-targeting strategies in cancer treatment. Additionally, this compound may serve as a useful research tool.

      The results of the in vitro cell killing assay and in vivo xenograft experiments in both immuno-proficient and immune-deficient mice indicate that inhibiting NMD could be a viable therapeutic strategy for certain cancers.

      Weaknesses:

      The authors did not address the potential effects of NMD/SMG1 inhibitors on RNA splicing. Given that the transcripts of many RNA-binding proteins are natural targets of NMD, inhibiting NMD could significantly alter splicing patterns. This, in turn, might influence the outcomes of the RNA-seq-based method for NMD analysis and result interpretation.

      This is a very important comment that highlights an important aspect of NMD and potential exciting downstream studies. We did not systematically assess RNA splicing in our work as we are not sure if inhibition of NMD would induce cancer specific splicing that would allow for tumor targeting. It is well established that NMD can impact splicing, including modulating cryptic exon expression, but finding and assessing antigenicity of targetable tumor specific antigens constitutes a study in and of its own. Our own data in figure 4C-F supports this, as a point mutation near a splice site in TP53 strongly induced NMD which was subsequently stopped by KVS0001 treatment. Doing a systematic review of this effect we feel is outside the scope of this manuscript. We’ve incorporated a comment into our discussion highlighting this deficiency, but certainly find the idea of mining RNA-splicing changes an exciting next endeavor.

      While the RNA-seq-based approach offers several advantages for analyzing NMD, the effects of NMD/SMG1 inhibitors observed through this method should be confirmed using established NMD reporters. This step is crucial to rule out the possibility that mutations in STAG2 or TP53 affect NMD in cells, as well as to address potential clonal variations between different engineered cell lines.

      This is possible, but we want to highlight that all hits from the screen were confirmed in a separate cell line with different clones. While this will not rule out effects to NMD due to STAG2 and TP53 knockdown, the final lead compound was also tested on different endogenous transcripts in both indel and normal transcripts controlled by NMD (i.e., ATF4) in multiple species (human and mouse).  Importantly, many of these assays employed the non-mutated transcripts from heterozygous mutant cells to ensure that cis-acting NMD was being measured and to control for any trans-acting splicing or other unanticipated biochemical effects.

      The results from the SMG1/UPF1 knockdown and SMG1i-11 experiments presented in Figure 3 correlate with the effects seen for LY3023414, but they do not conclusively establish SMG1 as the direct target of LY3023414 in NMD inhibition. An epistatic analysis with LY3023414 and SMG1-knockdown is needed.

      This is a great comment, and is supported by the recent push to confirm drug targets by chemical probes or knockout followed by loss of further effect due to the application of the drug in question. We attempted to knockout SMG1 in multiple cells lines used in this study, including RPE1, MCF10A, NCI-H358 and LS180, and were unable to obtain clones that have biallelic out of frame indels. We were able to obtain multiple clones with in frame indels. Based on our results and those in the publicly available database DepMap we suspect this gene is likely essential, making a simple knockout unfeasible. While this uncertainty is important to keep in mind, we feel it does not detract from the reporting of a novel NMD screen that is mechanistically agnostic and of a novel in vivo active NMD inhibitor.

      Reviewer #2 (Public Review):

      Summary:

      Several publications during the past years provided evidence that NMD protects tumor cells from being recognized by the immune system by suppressing the display of neoantigens, and hence NMD inhibition is emerging as a promising anti-cancer approach. However, the lack of an efficacious and specific small-molecule NMD inhibitor with suitable pharmacological properties is currently a major bottleneck in the development of therapies that rely on NMD inhibition. In this manuscript, the authors describe their screen for identifying NMD inhibitors, which is based on isogenic cell lines that either express wild-type or NMD-sensitive transcript isoforms of p53 and STAG2. Using this setup, they screened a library of 2658 FDA-approved or late-phase clinical trial drugs and had 8 hits. Among them they further characterized LY3023414, showing that it inhibits NMD in cultured cells and in a mouse xenograft model, where it, however, was very toxic. Because LY3023414 was originally developed as a PI3K inhibitor, the authors claim that it inhibits NMD by inhibiting SMG1. While this is most likely true, the authors do not provide experimental evidence for this claim. Instead, they use this statement to switch their attention to another previously developed SMG1 inhibitor (SMG1i-11), of which they design and test several derivatives. Of these derivatives, KVS0001 showed the best pharmacological behavior. It upregulated NMD-sensitive transcripts in cultured cells and the xenograft mouse model and two predicted neoantigens could indeed be detected by mass spectrometry when the respective cells were treated with KVS0001. A bispecific antibody targeting T cells to a specific antigen-HLA complex led to increased IFN-gamma release and killing of cancer cells expressing this antigen-HLA complex when they were treated with KVS0001. Finally, the authors show that renal (RENCA) or lung cancer cells (LLC) were significantly inhibited in tumor growth in immunocompetent mice treated with KVS0001. Overall, this establishes KVS0001 as a novel and promising ant-cancer drug that by inhibiting SMG1 (and therewith NMD) increases the neoantigen production in the cancer cells and reveals them to the body's immune system as "foreign".

      Strengths:

      The novelty and significance of this work consists in the development of a novel and - judging from the presented data - very promising NMD inhibiting drug that is suitable for applications in animals. This is an important advance for the field, as previous NMD inhibitors were not specific, lacked efficacy, or were very toxic and hence not suitable for animal application. It will be still a long way with many challenges ahead towards an efficacious NMD inhibitor that is safe for use in humans, but KVS0001 appears to be a molecule that bears promise for follow-up studies. In addition, while the idea of inhibiting NMD to trigger neoantigen production in cancer cells and so reveal them to the immune system has been around for quite some time, this work provides ample and compelling support for the feasibility of this approach, at least for tumors with a high mutational burden.

      Main weaknesses:

      There is a disconnect between the screen and the KVS0001 compound, that they describe and test in the second part of the manuscript since KVS0001 is a derivative of the SMG1 inhibitors developed by Gopalsamy et al. in 2012 and not of the lead compound identified in the screen (LY3023414). Because of high toxicity in the mouse xenograft experiments, the authors did not follow up LY3023414 but instead switched to the published SMG1i-11 drug of Gopalsamy and colleagues, a molecule that is widely used among NMD researchers for NMD inhibition in cultured cells. Therefore, in my view, the description of the screen is obsolete, and the paper could just start with the optimization of the pharmacological properties of SMG1i-11 and the characterization of KVS0001. Even though the screen is based on an elegant setup and was executed successfully, it was ultimately a failure as it didn't reveal a useful lead compound that could be further optimized.

      This is a helpful observation from an outside perspective. From our point of view, we were only alerted to the targeting SMG1 due to the previously reported off-target effects of LY3023414 on SMG and lack of plausible explanation for PIK3CA inhibition to efficiently inhibit NMD. We do feel that the screen is worth including for two reasons. First, it offers an unbiased approach for querying the entire NMD pathway for vulnerabilities useful to target. The library chosen was quite small, so the screen itself could be useful to others with larger libraries to test. Second, it did help identify SMG1 as the ideal target for NMD disruption. While targeting SMG1 is not novel, we felt it highlighted why we chose to develop KVS0001. To address this reviewer’s comment, we’ve included a couple sentences in the results and discussion strengthening the point that the screen provided an unbiased approach to finding the best target in the pathway to disrupt NMD and elaborating on the transition from LY3023414 and the screen to development of KVS0001.

      Additional points:

      - Compared to SMG1i-11, KVS0001 seems less potent in inhibiting SMG1 (higher IC50). It would therefore be important to also compare the specificity of both drugs for SMG1 over other kinases at the applied concentrations (1 uM for SMG1i-11, 5 uM for KVS0001). The Kinativ Assay (Fig. S13) was performed with 100 nM KVS0001, which is 50-fold less than the concentration used for functional assays and hence not really meaningful. In addition, more information on the pharmacokinetic properties and toxicology of KVS0001 would allow a better judgment of the potential of this molecule as a future therapeutic agent.

      We agree that the Kinativ assay may have poorly represented the activity of KVS0001 at the bioactive concentration. We have now added 1uM Kinativ data, the highest concentration we were able to run to figure S13.

      - On many figures, the concentrations of the used drugs are missing. Please ensure that for every experiment that includes drugs, the drug concentration is indicated.

      We apologize for this oversight and have added all drug concentrations on the appropriate plots.

      - Do the authors have an explanation for why LY3023414 has a much stronger effect on the p53 than on the STAG2 nonsense allele (Figure 1B, S8), whereas emetine upregulates the STAG2 nonsense alleles more than the p53 nonsense allele (Figure S5). I find this curious, but the authors do not comment on it.

      This is an interesting observation. The short answer is we’re not sure. The speculative answer is that it is related to the distinctly different mechanisms of actions of the two inhibitors (see comments from reviewing editor below).

      - While it is a strength of the study that the NMD inhibitors were validated on many different truncation mutations in different cell lines, it would help readers if a table or graphic illustration was included that gives an overview of all mutant alleles tested in this study (which gene, type of mutation, in which cell type). In the current version, this information is scattered throughout the manuscript.

      This is an excellent suggestion. We’ve included a new table S1 which incorporates the details of each cell line and the genes used in each for this study.

      - Lines 194 and 302: That SMG1i-11 was highly insoluble in the hands of the authors is surprising. It is unclear why they used variant 11j, since variant 11e of this inhibitor is widely used among NMD researchers and readily dissolves in DMSO.

      As this referee notes SMG1i-11 is soluble in DMSO in our hands as well, which enabled us to use it for our in vitro work. Unfortunately, the concentrations of DMSO required to dissolve the compound to suitable concentrations for in vivo work were too high to safely use in mice with our animal protocols. We also attempted to use ethanol, which also did dissolve SMG1i-11, but led to a significant amount of toxicity in both the drug and vehicle control arms.

      - Line 296: The authors claim that they were able to show that LY3023414 inhibited the SMG1 kinase, which is not true. To show this, they would have for example to show that LY3023414 prevents SMG1-mediated UPF1 phosphorylation, as they did for KVS0001 and SMG1i-11 in Fig. 3F. Unless the authors provide this data, the statement should be deleted or modified.

      We’ve modified this statement as requested by the referee, now saying we suspected SMG1 was the target based on previously published work.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      Your paper has been assessed by two reviewers with expertise in the NMD field. They both find the identification and characterization of a new potent and selective inhibitor of the SMG1 NMD kinase with in vivo activity to represent a significant advance in the field, and one that could ultimately be of value as the basis for a novel cancer therapy. However, as you will see both reviewers have concerns about whether the SMG1 inhibitor screen you developed belongs in the paper because it was not used to identify the KVS0001 inhibitor, which instead was generated based on a previously published set of SMG1 inhibitors, and because the NMD inhibitor that did emerge from your screen, LY3023414, was not shown to be a direct inhibitor of SMG1 kinase activity. While it is an elegant screen, during the revision of the paper you could consider streamlining the manuscript by emphasizing how the screening assay was used to validate KVS0001, and bolstering the characterization of the new KVS0001 NMD inhibitor by conducting the proposed additional experiments.

      Each of the reviewers raises additional points that should be addressed in a revised version.

      The reviewing editor has two additional points:

      (1) While emetine inhibits NMD, it is not really a direct NMD inhibitor, as implied, but rather a potent protein synthesis elongation inhibitor that acts by binding to the E-site of the 40S ribosomal subunit, and is therefore, like anisomycin, another protein synthesis inhibitor, working indirectly to inhibit NMD. This should be acknowledged in the section where emetine is first used as an "NMD inhibitor".

      This has been included in the indicated section at the referee’s request.  

      (2) To establish that the observed phenotypic effects of KVS0001 are due to on-target inhibition of SMG1, the authors could generate and express an SMG1 point mutant that is resistant to KVS0001 inhibition, which could be based on the SMG1 catalytic domain structure that the authors used originally to design KVS001. Inhibitor-resistant kinase mutants are the gold standard for demonstrating that the biological consequences of a novel protein kinase inhibitor are due to on-target effects. Admittedly, because SMG1 is such a huge protein, this may be technically challenging and is likely beyond the scope of the present paper.

      -We agree with the reviewing editor on all accounts: this would be an ideal experiment to run, but also that it is beyond the scope of the present paper. As indicated in our discussion above with reviewer 1, SMG1 knockout was not possible in our hands, and we suspect it may be due to the gene being essential. Creating an inhibitor resistant mutant could overcome this issue and create an ideal model to test the target for KVS0001. Unfortunately finding such a mutant would likely require significant amounts of trial and error to create a resistant mutant that did not lose SMG1 function. And SMG1 is huge, creating technical issues for experimenting. Due to the anticipated amount of work for such a study we believe this would be better accomplished in future studies.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors did not mention a new SMG1 inhibitor and its effects described in Cheruiyot et al, Cancer Res 2019 (PMID: 34215620).

      A comment regarding this discovery and its implications for our work was added to the discussion.

      (2) There is an inconsistency between the manuscript text and methods sections regarding the time of drug treatment (16 hours vs 14 hours) in the HTS screen.

      This has been double checked in our notebook and fixed to reflect 16hrs as the correct incubation time. Thank you for identifying that clerical oversight.

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 61: The references to NMD reviews are very old (refs 20 and 21). I suggest citing more recent, up-to-date reviews instead.

      Two additional references, one from 2016 and another from 2023, have been added to increase support for this statement in the introduction.

      (2) Figure S1: Shouldn't the caption of the right panel (TP32 data) say "clone 221" rather than "clone 22"?

      This has been fixed.

      (3) Figure S18: Please indicate on the y-axis that you are displaying RPKM for p53.

      This has been fixed.

      (4) Figures 4D and S19: Please indicate concentrations used for all drugs.

      This has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Joint Public Review:

      Summary:

      Brauns et al. work to decipher the respective contribution of active versus passive contributions to cell shape changes during germ band elongation. Using a novel quantification tool of local tension, their results suggest that epithelial convergent extension results from internal forces.

      Reading this summary, and the eLife assessment, we realized that we failed to clearly communicate important aspects of our findings in the first version of our manuscript. We therefore decided to largely restructure and rewrite the abstract and introduction to emphasize that:

      ● Our analysis method identifies active vs passive contributions to cell and tissue shape changes during epithelial convergent extension

      ● In the context of Drosophila germ band extension, this analysis provides evidence for a major role for internal driving forces rather than external pulling force from neighboring tissue regions (posterior midgut), thus settling a question that has been debated due to apparently conflicting evidence from different experiments.

      ● Our findings have important implications for local, bottom-up self-organization vs top-down genetic control of tissue behaviors during morphogenesis.

      Strengths:

      The approach developed here, tension isogonal decomposition, is original and the authors made the demonstration that we can extract comprehensive data on tissue mechanics from this type of analysis.

      They present an elegant diagram that quantifies how active and passive forces interact to drive cell intercalations.

      The model qualitatively recapitulates the features of passive and active intercalation for a T1 event.

      Regions of high isogonal strains are consistent with the proximity of known active regions.

      We think this statement is somewhat ambiguous and does not summarize our findings precisely. A more precise statement would be that high isogonal strain identifies regions of passive deformation, which is caused by adjacent active regions.

      They define a parameter (the LTC parameter) which encompasses the geometry of the tension triangles and allows the authors to define a criterium for T1s to occur.

      The data are clearly presented, going from cellular scale to tissue scale, and integrating modeling approaches to complement the thoughtful description of tension patterns.

      Weaknesses:

      The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      We fully agree that a full tissue scale model is crucial to support the claims about tissue scale self-organization we make in the discussion. However, the full analysis of such a model is beyond the scope of the present manuscript. We have therefore split off that analysis into a companion manuscript (Claussen et al. 2023). In this paper, we show that the key results of the tissue-scale analysis of the Drosophila embryo, in particular the order-to-disorder transition associated with slowdown of tissue flow, are reproduced and rationalized by our model.

      We now refer more closely to this companion paper to point the reader to the results presented there.

      Major points:

      (1) The authors mention that from their analysis, they can predict what is the tension threshold required for intercalations in different conditions and predict that in Snail and Twist mutants the T1 tension threshold would be around √2. Since movies of these mutants are most probably available, it would be nice to confirm these predictions.

      This is an excellent suggestion. We have included an analysis of a recording of a Snail mutant, which is presented in the new Figures 4 and S6. As predicted, we find that isogonal deformations in the ventro-lateral regions are absent when the external pulling force of the VF is abolished. Further, in the absence of isogonal deformation, T1 transitions indeed occur at a critical tension of approx. √2, as predicted by our model. Both of these results provide important experimental evidence for our model and for isogonal strain as a reliable indicator of external forces.

      (2) While the formalism is very elegant and convincing, and also convincingly allows making sense of the data presented in the paper, it is not all that clear whether the claims are compatible with previous experimental observations. In particular, it has been reported in different papers (including Collinet et al NCB 2015, Clement et al Curr Biol 2017) that affecting the initial Myosin polarity or the rate of T1s does not affect tissue-scale convergent extension. Analysis/discussion of the Tor phenotype (no extension with myosin anisotropy) and the Eve/Runt phenotype (extension without Myosin anisotropy), which seem in contradiction with an extension mostly driven by myosin anisotropy.

      We are happy to read that the referees find our approach elegant and convincing. The referees correctly point out that we have failed to clearly communicate how our findings connect to the existing literature on Drosophila GBE. Indeed, the conflicting results reported in the literature on what drives GBE – internal forces (myosin anisotropy) or external forces (pulling by the posterior midgut) – were a motivation for our study. We have extensively rewritten the introduction, results section (“Isogonal strain identifies regions of passive tissue deformation”), and discussion (“Internal and external contributions to germ band extension”) in response to the referee’s request.

      In brief, distinguishing active internal vs passive external driving of tissue flow has been a fundamental open question in the literature on morphogenesis. Our tension-isogonal decomposition now provides a way to answer this question on the cell scale, by identifying regions of passive deformation due to external forces. As we now explain more clearly, our analysis shows that germ band extension is predominantly driven by internal tension dynamics, and not pulling forces from the posterior midgut.

      We put this cell-scale evidence into the context of previous experimental observations on the tissue scale: Genetic mutants (fog, torso-like, scab, corkscrew, ksr), where posterior midgut invagination is disrupted (Muenster et al. 2019, Smits et al. 2023). In these mutants, the germ band buckles forming ectopic folds or twists into a corkscrew shape as it extends, pointing towards a buckling instability characteristic of internally driven extensile flows.

      To address the apparently conflicting evidence from Collinet et al. 2015, we carried out a

      quantitative re-analysis of the data presented in that reference (see new SI section 3 and Fig.

      S11). The results support the conclusion that the majority of GBE flow is driven internally, thus resolving the apparent conflict.

      Lastly, as far as we understand, Clement et al. 2017 appears to be compatible with our picture of active T1 transitions. Clement et al. report that the actin cortex, when loaded by external forces, behaves visco-elastically with a relaxation time of the order of minutes, in line with our model for emerging interfaces post T1.

      We again thank the referees for prompting us to address these important issues and believe that including their discussion has significantly strengthened our manuscript.

      Recommendations for the authors:

      Minor points:

      - Fig 2 : authors should state in the main text at which scale the inverse problem is solved. (Intercalating quartet, if I understood correctly from the methods) ? and they should explain and justify their choice (why not computing the inverse at a larger scale).

      We have rephrased the first sentence of the section “Cell scale analysis” to clarify that we use local tension inference. This local inference is informative about the relative tension of one interface to its four neighbors. The focus on this local level is justified because we are interested in local cell behaviors, namely rearrangements. Tension inference is also most robust on the local level, since this is where force balance, the underlying physical determinant of the link between mechanics and geometry, resides. In global tension inference, spurious large scale gradients can appear when small deviations from local force balance accumulate over large distances. We have added a paragraph in SI Sec. 1.4 to explain these points.

      -Fig 2 : how should one interpret that tension after passive intercalation (amnioserosa) is higher than before. On fig 2E, tension has not converged yet on the plot, what happens after 20 minutes ?

      Recall that the inferred tension is the total tension on an interface. While on contracting interfaces, the majority of this tension will be actively generated by myosin motors, on extending interfaces there is also a contribution carried by passive crosslinkers. The passive tension can be effectively viewed as viscous dissipation on the elongating interface as crosslinkers turn over (Clement et al. 2017). Note that this passive tension is explicitly accounted for in the model presented in Fig. 5. Notably, it is crucial for the T1 process to resolve in a new extending junction. In the amnioserosa, the tension post T1 remains elevated because the amnioserosa is continually stretched by the convergence of the germ band. The tension hence does not necessarily converge back to 1. However, our estimates for the tension after 20 mins post T1 are very noisy because most of the T1s happen relatively late in the movie (past the 25 min mark) and therefore there are only a few T1s where we can track the post-T1 dynamics for more than 20 mins.

      We have added a brief explanation of the high post-T1 tension at the end of the section entitled “Relative tension dynamics distinguishes active and passive intercalations”. Further, we have moved up the section describing the minimal model right after the analysis of the relative tension during intercalations. We believe that this helps the reader better understand these findings before moving on to the tension-isogonal decomposition which generalizes them to the tissue scale.

      Page 7-8 / Figure 3: It is unclear how the decomposition into 1) physical shape 2) tension shape 2) isogonal shape works exactly. A more detailed explanation and more clear illustration of what a quartet is and its labels could help.

      We have added a more detailed explanation in the main text. See our response to the longer question regarding this point below.

      -What exactly defines the boundary curve in figure 3E? How is it computed?

      We have added a sentence in the caption for Fig. 3E explaining that the boundary curve is found by solving Eq. (1) with l set to zero for the case of a symmetric quartet. We have also added a brief explanation immediately below Eq. (1) pointing out that this equation defines the T1 threshold in the space of local tensions T_i in terms of the isogonal length l_iso.

      -The authors should consider incorporating some details described in the SI file to the main text to clarify some points, as long as the accessible style of the manuscript can be kept. The points mentioned below may also be clarified in the SI doc. The specific points that could be elaborated are: Page 7-8 / Figure 3: It is unclear how the decomposition into 1) physical shape 2) tension shape 2) isogonal shape works exactly. A more detailed explanation and more clear illustration of what a quartet is and its labels could help. The mapping to Maxwell-Cremona space is fine, but which subset is the quartet? For a set of 4 cells with two shared vertices and a junction, aren't there 5 different tension vectors? Are we talking two closed force triangles? Separately, how do you exactly decompose the deformation (of 4 full cell shapes or a subset?) into isogonal and non-isogonal parts? What is the least squares fit done over - is this system underdetermined? Is this statistically averaged or computed per quartet and then averaged?

      We thank the referees for pointing us to unclear passages in our presentation. We hope that our revisions have resolved the referee’s questions. As described above, we have clarified the tension-isogonal decomposition in the main text. We have also revised the corresponding SI section (1.5) to address the above questions. A sketch of the quartet with labels is found in SI Fig. S7A which we now refer to explicitly in the main text.

      We always consider force-balance configurations, i.e. closed force triangles. Therefore in the “kite” formed by two adjacent tension triangles, only three tension vectors are independent.

      The decomposition of deformation is performed as follows: For each of the four cells, the center of mass c_i is calculated. Next, tension inference is performed to find the two tension triangles with tension vectors T_ij. Now there are three independent centroidal vectors c_j - c_i and three corresponding independent tension vectors T_ij. We define the isogonal deformation tensor I_quratet as the tensor that maps the centroidal vectors to the tension vectors. In general this is not possible exactly, because I_quartet has only three independent components, but there are six equations.

      The plots in Fig. 3C, C’ are obtained by performing this decomposition for each intercalating quartet individually. The data is then aligned in time and ensemble averages are calculated for each timepoint.

      For tissue-scale analysis in Fig. 6, the decomposition is performed for individual vertices (i.e. the corresponding centroidal and tension triangles) and then averaged locally to find the isogonal strain fields shown in Fig. 6B, B’.

      - Line 468: "Therefore, tissue-scale anisotropy of active tension is central to drive and orient convergent-extension flow [10, 57, 59, 60]." Authors almost never mention the contribution of the PMG to tissue extension. Yet it is known to be crucial (convergent extension in Tor mutants is very much affected). Please discuss this point further.

      The referees raise an important point: as discussed in our response to major point (2), we now explicitly discuss the role of internal (active tension) and external (PMG pulling) forces during germ band extension. Please see our response to major point (2) for the changes we made to the manuscript to address this.

      In particular, we now explain that in mutants where PMG invagination is impaired (fog, torso-like, torso, scab, corkscrew), the germ band buckles out of plane or extends in a twisted, corkscrew fashion (Smits et al. 2023). This shows that the germ band generates extensile forces largely internally. In torso mutants, the now stationary PMG acts as a barrier which blocks GBE extension; the germ band buckles as a response.

      The role of PMG invagination hence lies not in creating pulling forces to extend the germ band, but rather in “making room” to allow for its orderly extension. As shown by the genetics mutants just discussed, the synchronization of PMG invagination and GBE is crucial for successful gastrulation.

      -Typos:

      Line 74: how are intercalations are

      Line 84: vertices vertices

      Line 233: very differently

      Line 236: are can

      Line 390: energy which is the isogonal mode must

      Line 1585: reveals show

      Line 603: area Line 618: in terms of on the

      We have fixed these typos.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses: 

      - Only one mutant (YafK) is used to make the conclusion. 

      The aim of the study is to determine the effect of the hydrolysis of the PG→Lpp bond on the dynamics of the tethering of Lpp to PG. Since YafK is the only enzyme catalyzing this reaction, it is appropriate to compare the wild-type strain to an isogenic yafK deletion mutant. Nonetheless, we carefully consider this comment and will investigate the dynamics of the tethering of Lpp to PG in mutants deficient in the production of the L,D-transpeptidases responsible for tethering Lpp to PG.

      Additional kinetic analyses were performed on strains relying on a single L,D-transpeptidase for LPP tethering to PG. Escherichia coli produces three L,D-transpeptidases catalyzing the tethering of LPP to PG (Ybis, YcfS, and ErfK). The corresponding genes were deleted from the chromosome of strain BW25113, thus generating strain BW25113Δ3. Plasmids encoding each one of these three enzymes were independently introduced in BW25113Δ3. Qualitatively, LC-MS analyses revealed similar kinetics for the four Tri-KR isotopologues purified from wild-type strain BW25113 and from the three BW25113Δ3 derivatives producing a single plasmidencoded L,D-transpeptidase (Ybis, YcfS, or ErfK) under the control of a rhamnose inducible promoter (Prha) of plasmid pHV30 (Voedts et al. EMBO J. 2021 40:e108126, doi: 10.15252/embj.2021108126) (see panel A in figure 1 below). Briefly, and as indicated in the first version of the main text, the old→new Tri→KR isotopologue was first synthesized. The new→new isotopologue was not detected 5 min after the medium switch. These results indicate that the newly-synthesized PG disaccharidepeptide subunits and Lpp are independently incorporated into the expanding PG polymer. The proportion of the new→old isotopologue exceeded that of the old→new isotopologue at around 40 min (for the strain producing ErfK) or 20 min (for the strains producing Ybis or YcfS). This is the hallmark of the activity of the YafK hydrolase that liberates existing (old) Lpp that can be tethered to newly synthesized disaccharide-peptide subunit thereby generating the new→old isotopologue. In absence of the YafK hydrolase, the relative proportion of the new→old isotopologue is lower since this isotopologue can only result from the tethering of the preexisting free forms of Lpp to newly synthesized disaccharide-peptide units. The contribution of YafK to variations in the relative abundance of the four isotopologues was also investigated by combining the relative abundance of isotopologues containing either old versus new KR (panel B) or old versus new PG stem peptide (panel C) moieties. As discussed in the first version of the manuscript for strains BW25113 and BW25113ΔyafK, this analysis revealed that the existing (old) disaccharide-tripeptide moieties in the Tri→RK isotopologues disappears more rapidly than the existing (old) KR moieties due to the hydrolysis of the old→old Tri-KR isotopologue by YafK. These results indicate that the mode of tethering of Lpp to PG and the dynamic equilibrium between the PG-tethered and free forms of Lpp are similar for the Ybis, YcfS, and ErfK L,D-transpeptidases. Quantitatively, we also noticed that the overall decrease in the relative abundance of all Tri→KR isotopologues containing existing (old) moieties was slower for the strains producing only ErfK, Ybis, or YcfS than for the wild type and ΔyafK strains.  This could be accounted for by an increase in the generation time of the former group of three strains. This is a limitation of our study because it precludes the comparison of the evolution of a particular isotopologue in several strains, as performed in Fig. 3 for strains BW25113 and BW25113ΔyafK. For this reason, we prefer to present these data in the rebuttal rather than in the manuscript. Indeed, presentation of the data in the main text would require introducing a new mode of presentation of the data (variations in the relative abundance of all four isotopologues in the same strain; see figure below) in addition to variations of the relative abundance of any one of the four isotopologues between strains (Fig. 3). Introduction of this additional mode of presentation of the data would complicate the initial manuscript in an unnecessary manner because the data obtained with mutants producing a single L,D-transpeptidase (ErfK, YbiS, or YcfS) confirmed the data obtained with the wild-type strains producing the three L,D-transpeptidases.

      Author response image 1.

      MS-based kinetic analysis of Lpp tethering to PG.

      -Time points to analyse Tri-KR isotopologues in Wt (0,10,20,40,60 min) and yafK mutant (0,15, 25, 40, 60 min) are not the same. 

      The purpose of the experiments is to compare the kinetics of formation and hydrolysis of the PG→Lpp bond in the WT versus ΔyafK strains. Comparison of the kinetics is therefore possible even though the kinetics are not based on the exact same time points. Nonetheless, we will reproduce the kinetics experiment (see also answers to Reviewer 2) and use the same time points in these additional experiments.

      We have performed additional analyses to provide kinetic data for at least three biological repeats and for the same periods of incubation after the medium switch (0, 10, 20, 40, and 60 min). The full set of data, including means and standard deviations, appear in the additional Table S1. We have also updated Fig. 3 with the means calculated with these additional values. The conclusions of the first version of the manuscript are fully supported by the additional data requested by the reviewer. We have also revised Fig. 4 based on the full set of data appearing in Table S2.

      Reviewer #2 (Public Review): 

      Weaknesses: 

      - However, the authors make a few other conclusions from their data which are harder to understand the logic of, or to feel confident in based on the existing data. They claim that their 5-time point kinetic data indicates that new lpp is not substantially added to lipidII before it is added to the peptidoglycan, and that instead lpp is attached primarily to old peptidoglycan. I believe that this conclusion comes from the comparison of Fig.s 3A and 3C, where it appears that new lpp is added to old peptidoglycan a few minutes before new lpp is added to new peptidoglycan. However, the very small difference in the timing of this result, the minimal number of time points and the complete lack of any presentation of calculated error in any of the data make this conclusion very tenuous. In addition, the authors conclude that lpp is not significantly attached to septal peptidoglycan. The logic behind this conclusion appears to be based on the same data, but the authors do not provide a quantitative model to support this idea.  

      The reviewer is correct in stating that we claim that Lpp is not substantially added to lipid II before incorporation of the disaccharide-pentapeptide subunit into the expanding PG network. This conclusion is based on the paucity of PG-Lpp covalent adducts containing light PG and Lpp moieties at the earliest time points. To substantiate more thoroughly this finding, we will reproduce the kinetic experiments with more early time points. The paucity of the new→new PG-Lpp isotopologues also implies that Lpp might not be extensively tethered to septal peptidoglycan since the latter is assembled from newly synthesized PG (see our previous publication Atze et al. 2021 and references therein). Quantitatively, septal synthesis roughly accounts for one third of the total PG synthesis. It is therefore expected that tethering of Lpp to septal PG would represent one third of the total number of newly synthesized Lpp molecules tethered to PG. We therefore proposed that the paucity of new→new PG- Lpp isotopologues at early time points of the kinetics implies that Lpp is preferentially tethered to the side wall. This is only one of several conclusions that we reach in the present study and we were very careful in the wording of our results. 

      We would first like to stress that our claim that Lpp is primarily attached to old peptidoglycan rather than to lipid II is indeed supported by the results presented in the first version of the manuscript. In fact, the opposite mechanism, i.e. Lpp linking to Lipid II, as established for the linking of proteins to PG by sortases in Gram-positive bacteria, would result in the exclusive tethering of newly synthesized Lpp to newly synthesized PG stems (Fig. 3). This is clearly not the case since the new→new isotopologues are present in small amounts 10 min after the medium switch and are not detectable at 5 min (data appearing in Table S1 and new mass spectra added to Supplementary file 1). Instead, our data indicate that newly synthesized Lpp is tethered to existing PG. Thus, the relevant comparison is not the absolute value of the delay in the appearance of isotopologues in Figs 3A and 3C, as suggested by the reviewer. Rather, the relevant comparison should take into consideration these two following modes of Lpp tethering to PG: (i) tethering Lpp to Lipid II versus (ii) tethering of Lpp to existing PG independently from insertion of new subunits into the expanding PG. The former mode implies the exclusive formation of new→new isotopologues, which were not detected at early time points. The latter mode implies the prevalent formation of old→new isotopologues that were indeed preponderant at early time-points. Thus, our analysis clearly eliminates the first mode of Lpp tethering to PG (tethering of Lpp to Lipid II) and validates the second one (tethering of Lpp to existing PG). As stated in our answers to reviewer 1, we have generated additional repeats and the full set of data, including means and SD values, appears in the additional Supplementary Tables S1 and S2. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      -All major reactions catalysed by L,D-transpeptidases must be studied using the labeling-mass spec technique and compared with YafK to strengthen the conclusions. 

      As described above (Figure 1), we explored the dynamics of Lpp tethering in mutants producing a single L,D-transpeptidase.

      -Experiments on the effect of YafK on the bacterial envelope and production of vesicles should be concluded to support the claims. 

      We have analyzed the extent of outer membrane vesicle (OMV) formation both in the wild type strain and in each one of the mutant strains characterized in this study by using a procedure described in detail in one of our previous publications (Hugonneau-Beaufet et al. Microbiol Spectr. 2023 11:e0521722, doi: 10.1128/spectrum.05217-22). Figure 2 below shows that loss of Lpp or of its tethering to PG, following deletion of genes encoding L,D-transpeptidases ErfK, YbiS, and YcfS, results in the formation of OMVs as revealed by the presence of the maltose-binding protein (MBP, 42 kDa) in the corresponding spare culture medium (as detected by immunoblotting). The RNA polymerase subunit RpoA (36 kDa), used as a control, was not detected in these spare culture media, indicating that loss of either Lpp alone or of ErfK, YbiS, and YcfS together was not associated with bacterial lysis. This analysis also showed that production of ErfK, YbiS, or YcfS alone was sufficient to prevent formation of OMVs. Finally, deletion of YafK, as expected, did not lead to OMV formation. These confirmatory results are out of the scope of the manuscript that focuses on the dynamics of Lpp tethering to PG rather than on the role of that tethering in the envelope stability. 

      Author response image 2.

      Figure 2. Immuno-detection of OMV formation.

      Reviewer #2 (Recommendations For The Authors): 

      - Why so much background about previous results in the abstract? Previous results don't seem required for understanding the description of new results here. Maybe put a sentence about importance at the end, instead.

      The background information is important for two reasons. First, because it is important to stress that the method used to determine the structure and dynamics of the isotopologues is novel and has been validated in various ways, including the modeling of isotopic clusters, in a previous study (https://doi.org/10.7554/eLife.72863). Since the current study is an extension of this previous report it is relevant to introduce the type of information that can be obtained by this approach. Second, because it is also important to stress that kinetic analyses have been previously reported for the incorporation              of           disaccharide-peptide      units into        the         expanding           peptidoglycan (https://doi.org/10.7554/eLife.72863). In the current study, we focused on the mode of Lpp-to-PG tethering in the context of PG expansion that thus had to be introduced. 

      - Abstract: tethering of lpp to septal pg is limited by what? Limited to what? Wording not clear.

      The unclear sentence has been rephrased. Revised version “Newly synthesized septum PG appears to contain small amounts of tethered Lpp.”  

      - The figure legend for fig 1b - I only see one red double arrow?

      Black double arrows indicate the position of glycosidic bonds cleaved by the muramidases. Their size was increased so that they appear more distinctly in the image.

      - Fig 3 and Fig 4- these should be shown with error. 

      The full set of data with means and standard deviations appear in Supplementary Tables S1 and S2.

      - This new-> old, old-> new annotation is confusing. Is the PG fragment or the lpp old or new? Are you distinguishing between which part is old and new by the ordering? Or, could either the PG fragment or the lpp be old to be annotated as old-> new? I think you are trying to explain it in the figure 3CD legend, but it could be presented more clearly. When you say respectively, do you mean that old->new means old muropeptide, new lpp? And new-> old means new muropeptide and old lpp? Why not just use the same annotation system you use in fig 2? Or, use subscripts to indicate old and new?. 

      The designation of isotopologues is correct and adequate to designate the products of transpeptidation catalyzed both by PBPs and L,D-transpeptidases. This nomenclature of transpeptidation products has been introduced in the 70s (see Schleifer and Kandler 1972 Bacteriological Reviews 36:407-477).  In this bond designation, the acyl donor and the acyl acceptor appear left and right, respectively, separated by an arrow to indicate the CO-to-NH polarity of the amide bond. For the Tri→KR isotopologues, the peptide stem acts as the acyl donor whereas Lpp acts as the acyl acceptor. There is therefore no ambiguity in the annotation. This also applies to the old→new-type annotation, old (existing) PG stem linked to new (neosynthesized) Lpp. In the figures, we used a color code to identify old (red) and new (purple) in the Tri→KR moieties. Since a color code cannot be used in the main text, we used the old→new-type of annotation. A sentence has been added at the end of the legend to Fig. 1b to introduce this nomenclature “Please note that we used the standard nomenclature for transpeptidation products in which the acyl donor and the acyl acceptor appear left and right, respectively, separated by an arrow to indicate the CO-to-NH polarity of the amide bond”.

      - Pg 5 - first paragraph. I'm struggling with the logic of your conclusion that lpp is not attached to lipid II - it seems that this conclusion is based on the timing of the appearance of the hybrid isotopes. You say you would expect the new-new ones to appear quickly, but how quickly would you expect that, and why? You do see new-new ones appearing fairly quicky, in 20 minutes, so I don't understand the logic of why that timing excludes the lipidII modification model. Please elaborate further. 

      See answer above to reviewer 2 and analysis of samples collected shortly after the medium switch (Table S1). See also the revised version of Supplementary file 1 that shows mass spectra for peptidoglycan extracted 5 min after the medium switch.

      - The conclusion about tethering of lpp to septal PG also appears to be somewhat tenuous, which the authors concede when then use the word "might" in the section of the results. However, the language in the abstract is more definitive. Please tone down the language in the abstract, or provide more evidence to support this conclusion. At the least, you could add a little discussion of the numbers. At a given time in mixed culture, how much PG is being constructed at the septum? How does that percentage line up with the rate of PG label loss vs the rate of lpp label loss? 

      -  Pg 5, bottom paragraph. I don't know what you mean by "there was no loss of old->old in the ∆yafK strains, " when you just a sentence above described the decrease. 

      The data of the MS analyses are presented as the relative abundance of isotopologues. If the old→old Tri→KR isotopologue present at the medium shift were not hydrolyzed by YafK, its absolute amount would remain constant over time. However, the relative abundance of the old→old isotopologue decreases by 50% in one generation because the total amount of the Tri→KR muropeptide doubles in one generation (as any of the bacterial constituents). In Fig. 3B, we indeed observed that the relative amount of old→old isotopologue is about 50% after one generation in the ΔyafK mutant indicating the persistence of the isotopologue. In contrast, production of YafK in the strain BW25113 results in lower abundance of this isotopologue (in the order of 90%). 

      To better explicit the concept we expanded the reasoning in the relevant paragraph of the revised version of the manuscript. 

      - Pg 6 - I don't understand how you are drawing a conclusion about the proteolytic degradation of lpp from these data. Please clarify your reasoning.

      In the analysis presented in Fig. 4, we investigated the relative abundance of old and new Lpp based on the relative abundance of old and new KR moieties in all four Tri-KR isotopologues. As stated in the preceding answer, the relative abundance of KR moieties should be 50% after one generation if no degradation of Lpp occurs. This is observed both for BW25113 (Fig. 4A) and for the ΔyafK mutant (Fig. 4B), thus supporting our claim that Lpp is not degraded. In contrast, the relative abundance of the old Tri moiety is lower than 50% for the wild type strain (Fig. 4C) but not for the ΔyafK mutant (Fig. 4D). This reflects the fact that YafK hydrolyzes the PG-Lpp bond and that Lpp released by this reaction can be cross-linked to neo-synthesized PG stems. Please note that, in this reaction, the substrate is a tetrapeptide donor stem (Fig. 1C).

    1. Author response:

      Reviewer #1 (Public Review):

      [...] Strengths:

      This study provides direct evidence that the chronic activation of dopamine neurons is toxic and gives rise to neurodegeneration. In addition, the authors achieved the chronic activation of dopamine neurons using water application of clozapine-N-oxide (CNO), a method not commonly employed by researchers. This approach may offer new insights into pathophysiological alterations of dopamine neurons in Parkinson's disease. The authors also utilized state-of-the-art spatial gene expression analysis, which can provide valuable information for other researchers studying dopamine neurons. Although the authors did not elucidate the mechanisms underlying dopaminergic neuronal and axonal death, they presented a substantial number of intriguing ideas in their discussion, which are worth further investigation.

      We thank the reviewer for these positive comments.

      Weaknesses:

      Many claims raised in this paper are only partially supported by the experimental results. So, additional data are necessary to strengthen the claims. The effects of chronic activation of dopamine neurons are intriguing; however, this paper does not go beyond reporting phenomena. It lacks a comprehensive explanation for the degeneration of dopamine neurons and their axons. While the authors proposed possible mechanisms for the degeneration in their discussion, such as differentially expressed genes, these remain experimentally unexplored.

      We thank the reviewer for this review. We do believe that the manuscript has a mechanistic component, as the central experiments involve direct manipulation of neuronal activity, and we show an increase in calcium levels and gene expression changes in dopamine neurons that coincide with the degeneration. However, we agree that deeper mechanistic investigation would strengthen the conclusions of the paper. We have planned several important revisions, including the addition of CNO behavioral controls, manipulation of intracellular calcium using isradipine, additional transcriptomics experiments and further validation of findings. We anticipate that these additions will significantly bolster the conclusions of the paper.

      Reviewer #2 (Public Review):

      [...] Strengths:

      This is an exciting and important paper.

      The paper compares mouse transcriptomics with human patient data.

      It shows that selective degeneration can occur across the midbrain dopaminergic neurons even in the absence of a genetic, prion, or toxin neurodegeneration mechanism.

      We thank the reviewer for these insightful comments.

      Weaknesses:

      Major concerns:

      (1) The lack of a CNO-positive, DREADD-negative control group in the behavioral experiments is the main limitation in interpreting the behavioral data. Without knowing whether CNO on its own has an impact on circadian rhythm or motor activity, the certainty that dopaminergic hyperactivity is causing these effects is lacking.

      This is an important point. Although we show that CNO does not produce degeneration of DA neuron terminals, we do not exclude a contribution to the behavioral changes. We agree that this behavioral control is necessary, and will address it in revision with a CNO-only running wheel cohort.

      (2) One of the most exciting things about this paper is that the SNc degenerates more strongly than the VTA when both regions are, in theory, excited to the same extent. However, it is not perfectly clear that both regions respond to CNO to the same extent. The electrophysiological data showing CNO responsiveness is only conducted in the SNc. If the VTA response is significantly reduced vs the SNc response, then the selectivity of the SNc degeneration could just be because the SNc was more hyperactive than the VTA. Electrophysiology experiments comparing the VTA and SNc response to CNO could support the idea that the SNc has substantial intrinsic vulnerability factors compared to the VTA.

      We agree that additional electrophysiology conducted in the VTA dopamine neurons would meaningfully add to our understanding of the selective vulnerability in this model, and will complete these experiments in revision.

      (3) The mice have access to a running wheel for the circadian rhythm experiments. Running has been shown to alter the dopaminergic system (Bastioli et al., 2022) and so the authors should clarify whether the histology, electrophysiology, fiber photometry, and transcriptomics data are conducted on mice that have been running or sedentary.

      We will explicitly clarify which mice had access to a running wheel in our revision. Briefly, mice for histology, electrophysiology, and transcriptomics all had access to a running wheel during their treatment. The mice used for photometry underwent about 7 days of running wheel access approximately 3 weeks prior to the beginning of the experiment. The photometry headcaps sterically prevented mice from having access to a running wheel in their home cage.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Rademacher and colleagues examined the effect on the integrity of the dopamine system in mice of chronically stimulating dopamine neurons using a chemogenetic approach. They find that one to two weeks of constant exposure to the chemogenetic activator CNO leads to a decrease in the density of tyrosine hydroxylase staining in striatal brain sections and to a small reduction of the global population of tyrosine hydroxylase positive neurons in the ventral midbrain. They also report alterations in gene expression in both regions using a spatial transcriptomics approach. Globally, the work is well done and valuable and some of the conclusions are interesting. However, the conceptual advance is perhaps a bit limited in the sense that there is extensive previous work in the literature showing that excessive depolarization of multiple types of neurons associated with intracellular calcium elevations promotes neuronal degeneration. The present work adds to this by showing evidence of a similar phenomenon in dopamine neurons.

      We thank the reviewer for the careful and thoughtful review of our manuscript.

      While extensive depolarization and associated intracellular calcium elevations promotes degeneration generally, we emphasize that the process we describe is novel. Indeed, prior studies delivering chronic DREADDs to vulnerable neurons in models of Alzheimer’s disease did not report an increase in neurodegeneration, despite seeing changes in protein aggregation (e.g. Yuan and Grutzendler, J Neurosci 2016, PMID: 26758850; Hussaini et al., PLOS Bio 2020, PMID: 32822389). Further, a critical finding from our study is that in our paradigm, this stressor does not impact all dopamine neurons equally, as the SNc DA neurons are more vulnerable than the VTA, mirroring selective vulnerability characteristic of Parkinson’s disease. This is consistent with a large body of literature that SNc dopamine neurons are less capable of handling large energetic and calcium loads compared to neighboring VTA neurons, and the finding that chronically altered activity is sufficient to drive this preferential loss is novel.

      In addition, we are not aware of prior studies that have chronically activated DREADDs to produce neurodegeneration. Other studies have shown that acute excitotoxic stressors can produce neuronal degeneration, but the chronic increase in activity is central to our approach.

      In terms of the mechanisms explaining the neuronal loss observed after 2 to 4 weeks of chemogenetic activation, it would be important to consider that dopamine neurons are known from a lot of previous literature to undergo a decrease in firing through a depolarization-block mechanism when chronically depolarized. Is it possible that such a phenomenon explains much of the results observed in the present study? It would be important to consider this in the manuscript.

      As discussed in greater detail in the results section below, our data suggests this may not be a prominent feature in our model. However, we cannot rule out a contribution of depolarization block, and will expand on the discussion of this possibility in the revised manuscript.

      The relevance to Parkinson's disease (PD) is also not totally clear because there is not a lot of previous solid evidence showing that the firing of dopamine neurons is increased in PD, either in human subjects or in mouse models of the disease. As such, it is not clear if the present work is really modelling something that could happen in PD in humans.

      We completely agree that evidence of increased dopamine neuron activity from human PD patients is lacking and the existing data are difficult to interpret without human controls. However, as we outline in the manuscript, multiple lines of evidence suggest that the activity level of dopamine neurons almost certainly does change in PD. Therefore, it is very important that we understand how changes in the level of neural activity influence the degeneration of DA neurons. In this paper we examine the impact of increased activity. Increased activity may be compensatory after initial dopamine neuron loss, or may be an initial driver of death (Rademacher & Nakamura, Exp Neurol 2024, PMID: 38092187). Beyond what is already discussed in the manuscript, additional support for increased activity in PD models include:

      - Elevated firing rates in asymptomatic MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488)

      - Increased frequency of spontaneous firing in patient-derived iPSC dopamine neurons and primary mouse dopamine neurons that overexpress synuclein (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060)

      - Increased spontaneous firing in dopamine neurons of rats injected with synuclein preformed fibrils compared to sham (Tozzi et al., Brain 2021, PMID: 34297092)

      We will include and further discuss these important examples in our revision.

      Similarly, in future studies, it will also be important to study the impact of decreasing DA neuron activity. There will be additional levels of complexity to accurately model changes in PD, which may differ between subtypes of the disease, the disease stage, and the subtype of dopamine neuron. Our study models the possibility of chronically increased pacemaking, and interpretation of our results will be informed as we learn more about how the activity of DA neurons changes in humans in PD. We will discuss and elaborate on these important points in the revision.

      Comments on the introduction:

      The introduction cites a 1990 paper from the lab of Anthony Grace as support of the fact that DA neurons increase their firing rate in PD models. However, in this 1990 paper, the authors stated that: "With respect to DA cell activity, depletions of up to 96% of striatal DA did not result in substantial alterations in the proportion of DA neurons active, their mean firing rate, or their firing pattern. Increases in these parameters only occurred when striatal DA depletions exceeded 96%." Such results argue that an increase in firing rate is most likely to be a consequence of the almost complete loss of dopamine neurons rather than an initial driver of neuronal loss. The present introduction would thus benefit from being revised to clarify the overriding hypothesis and rationale in relation to PD and better represent the findings of the paper by Hollerman and Grace.

      We agree that the findings of Hollerman and Grace support compensatory changes in dopamine neuron activity in response to loss of dopamine neurons, rather than informing whether dopamine neuron loss can also be an initial driver of activity. We will clarify this point in our revision. In addition, the results of other studies on this point are mixed: a 50% reduction in dopamine neurons didn’t alter firing rate or bursting (Harden and Grace, J Neurosci 1995, PMID: 7666198; Bilbao et al, Brain Res 2006, PMID: 16574080), while a 40% loss was found to increase firing rate and bursting (Chen et al, Brain Res 2009. PMID: 19545547) and larger reductions alter burst firing (Hollerman & Grace, Brain Res 1990, PMID: 2126975; Stachowiak et al, J Neurosci 1987, PMID: 3110381). Importantly, even if compensatory, such late-stage increases in dopamine neuron activity may contribute to disease progression and drive a vicious cycle of degeneration in surviving neurons. In addition, we also don’t know how the threshold of dopamine neuron loss and altered activity may differ between mice and humans, and PD patients do not present with clinical symptoms until ~30-60% of nigral neurons are lost (Burke & O’Malley, Exp Neurol 2013, PMID: 22285449; Shulman et al, Annu Rev Pathol 2011, PMID: 21034221).

      Other lines of evidence support the potential role of hyperactivity in disease initiation, including increased activity before dopamine neuron loss in MitoPark mice (Good et al., FASEB J 2011, PMID: 21233488), increased spontaneous firing in patient-derived iPSC dopamine neurons (Lin et al., Acta Neuropath Comm 2021, PMID: 34099060), and increased activity observed in genetic models of PD (Bishop et al., J Neurophysiol 2010, PMID: 20926611; Regoni et al., Cell Death Dis 2020,  PMID: 33173027).

      It would be good that the introduction refers to some of the literature on the links between excessive neuronal activity, calcium, and neurodegeneration. There is a large literature on this and referring to it would help frame the work and its novelty in a broader context.

      We agree that a discussion of hyperactivity, calcium, and neurodegeneration would benefit the introduction. While we briefly discuss calcium and neurodegeneration in the discussion, we will expand on this literature in both the introduction and discussion sections. We will carefully review and contextualize our work within existing frameworks of calcium and neurodegeneration (e.g. Surmeier & Schumacker, J Biol Chem 2013, PMID: 23086948; Verma et al., Transl Neurodegener 2022, PMID: 35078537). We believe that the novelty of our study lies in 1) a chronic chemogenetic activation paradigm via drinking water, 2) demonstrating selective vulnerability of dopamine neurons as a result of altering their activity/excitability alone, and 3) comparing mouse and human spatial transcriptomics.

      Comments on the results section:

      The running wheel results of Figure 1 suggest that the CNO treatment caused a brief increase in running on the first day after which there was a strong decrease during the subsequent days in the active phase. This observation is also in line with the appearance of a depolarization block.

      The authors examined many basic electrophysiological parameters of recorded dopamine neurons in acute brain slices. However, it is surprising that they did not report the resting membrane potential, or the input resistance. It would be important that this be added because these two parameters provide key information on the basal excitability of the recorded neurons. They would also allow us to obtain insight into the possibility that the neurons are chronically depolarized and thus in depolarization block.

      We do report the input resistance in Supplemental Figure 1C, which was unchanged in CNO-treated animals compared to controls. We did not report the resting membrane potential because many of the DA neurons were spontaneously firing. However, we will report the initial membrane potential on first breaking into the cell for the whole cell recordings in the revision, which did not vary between groups. This is still influenced by action potential activity, but is the timepoint in the recording least impacted by dialyzing of the neuron by the internal solution. We observed increased spontaneous action potential activity ex vivo in slices from CNO-treated mice (Figure 1D), thus at least under these conditions these dopamine neurons are not in depolarization block. We also did not see strong evidence of changes in other intrinsic properties of the neurons with whole cell recordings (e.g. Figure S1C). Overall, our electrophysiology experiments are not consistent with the depolarization block model, at least not due to changes in the intrinsic properties of the neurons. Although our ex vivo findings cannot exclude a contribution of depolarization block in vivo, we do show that CNO-treated mice removed from their cages for open field testing continue to have a strong trend for increased activity for approximately 10 days (S1E).  This finding is also consistent with increased activity of the DA neurons. We will add discussion of these important considerations in the revision.

      It is great that the authors quantified not only TH levels but also the levels of mCherry, co-expressed with the chemogenetic receptor. This could in principle help to distinguish between TH downregulation and true loss of dopamine neuron cell bodies. However, the approach used here has a major caveat in that the number of mCherry-positive dopamine neurons depends on the proportion of dopamine neurons that were infected and expressed the DREADD and this could very well vary between different mice. It is very unlikely that the virus injection allowed to infect 100% of the neurons in the VTA and SNc. This could for example explain in part the mismatch between the number of VTA dopamine neurons counted in panel 2G when comparing TH and mCherry counts. Also, I see that the mCherry counts were not provided at the 2-week time point. If the mCherry had been expressed genetically by crossing the DAT-Cre mice with a floxed fluorescent reported mice, the interpretation would have been simpler. In this context, I am not convinced of the benefit of the mCherry quantifications. The authors should consider either removing these results from the final manuscript or discussing this important limitation.

      We thank the reviewer for this insightful comment, and we agree that this is a caveat of our mCherry quantification. Quantitation of the number of mCherry+ DA neurons specifically informs the impact on transduced DA neurons, and mCherry appears to be less susceptible to downregulation versus TH. As the reviewer points out, it carries the caveat that there is some variability between injections. Nonetheless, we believe that it conveys useful complementary data. As suggested, we will discuss this caveat in our revision. Note that mCherry was not quantified at the two-week timepoint because there is no loss of TH+ cells at that time.

      Although the authors conclude that there is a global decrease in the number of dopamine neurons after 4 weeks of CNO treatment, the post-hoc tests failed to confirm that the decrease in dopamine number was significant in the SNc, the region most relevant to Parkinson's. This could be due to the fact that only a small number of mice were tested. A "n" of just 4 or 5 mice is very small for a stereological counting experiment. As such, this experiment was clearly underpowered at the statistical level. Also, the choice of the image used to illustrate this in panel 2G should be reconsidered: the image suggests that a very large loss of dopamine neurons occurred in the SNc and this is not what the numbers show. A more representative image should be used.

      We agree that the stereology experiments were performed on relatively small numbers of animals. Combined with the small effect size, this may have contributed to the post-hoc tests showing a trend of p=0.1 for both the TH and mCherry dopamine cell counts in the SN at 4 weeks. As part of the planned experiments for our revision, we will perform an additional stereologic analysis to further assess the loss of SNc dopamine neurons. We will also review and ensure the images are representative.

      In Figure 3, the authors attempt to compare intracellular calcium levels in dopamine neurons using GCaMP6 fluorescence. Because this calcium indicator is not quantitative (unlike ratiometric sensors such as Fura2), it is usually used to quantify relative changes in intracellular calcium. The present use of this probe to compare absolute values is unusual and the validity of this approach is unclear. This limitation needs to be discussed. The authors also need to refer in the text to the difference between panels D and E of this figure. It is surprising that the fluctuations in calcium levels were not quantified. I guess the hypothesis was that there should be more or larger fluctuations in the mice treated with CNO if the CNO treatment led to increased firing. This needs to be clarified.

      We thank the reviewer for this comment. We understand that this method of comparing absolute values is unconventional. However, these animals were tested concurrently on the same system, and a clear effect on the absolute baseline was observed. We will include a caveat of this in our discussion. Panel D of this figure shows the raw, uncorrected photometry traces, whereas panel E shows the isosbestic corrected traces for the same recording. In panel E, the traces follow time in ascending order. We will also include frequency and amplitude data for these recordings.   

      Although the spatial transcriptomic results are intriguing and certainly a great way to start thinking about how the CNO treatment could lead to the loss of dopamine neurons, the presented results, the focusing of some broad classes of differentially expressed genes and on some specific examples, do not really suggest any clear mechanism of neurodegeneration. It would perhaps be useful for the authors to use the obtained data to validate that a state of chronic depolarization was indeed induced by the chronic CNO treatment. Were genes classically linked to increased activity like cfos or bdnf elevated in the SNc or VTA dopamine neurons? In the striatum, the authors report that the levels of DARP32, a gene whose levels are linked to dopamine levels, are unchanged. Does this mean that there were no major changes in dopamine levels in the striatum of these mice?

      We will review the expression of activity-related genes in our dataset, although we must keep in mind that these genes may behave differently in the context of chronic activation as opposed to acutely increased activity. We will also include experiments assessing striatal dopamine levels by HPLC in the revision.

      The usefulness of comparing the transcriptome of human PD SNc or VTA sections to that of the present mouse model should be better explained. In the human tissues, the transcriptome reflects the state of the tissue many years after extensive loss of dopamine neurons. It is expected that there will be few if any SNc neurons left in such sections. In comparison, the mice after 7 days of CNO treatment do not appear to have lost any dopamine neurons. As such, how can the two extremely different conditions be reasonably compared?

      Our mouse model and human PD progress over distinct timescales, as is the case with essentially all mouse models of neurodegenerative diseases. Nonetheless, in our view there is still great value in comparing gene expression changes in mouse models with those in human disease. It seems very likely that the same pathologic processes that drive degeneration early in the disease continue to drive degeneration later in the disease. Note that we have tried to address the discrepancy in time scales in part by comparing to early PD samples when there is more limited SNc DA neuron loss. Please note the numbers of DA neurons within the areas we have selected for sampling (Figure at right). Therefore, we can indeed use spatial transcriptomics to compare dopamine neurons from mice with initial degeneration and patients where degeneration is ongoing during their disease.

      Author response image 1.

      Violin plot of DA neuron proportions sampled within the vulnerable SNV (deconvoluted RCTD method used in unmasked tissue sections of the SNV).

      Control and early PD subjects.

      Comments on the discussion:

      In the discussion, the authors state that their calcium photometry results support a central role of calcium in activity-induced neurodegeneration. This conclusion, although plausible because of the very broad pre-existing literature linking calcium elevation (such as in excitotoxicity) to neuronal loss, should be toned down a bit as no causal relationship was established in the experiments that were carried out in the present study.

      Our model utilizes hM3Dq-DREADDs that function by increasing intracellular calcium to increase neuronal excitability, and our results show increased Ca2+ by fiber photometry and changes to Ca2+-related genes, strongly suggesting a causal relation and crucial role of calcium in the mechanism of degeneration. However, we agree that we have not experimentally proven this point, as we acknowledged in the text. Additionally, we have planned revision experiments involving chronic isradipine treatment to further test the role of calcium in the mechanism of degeneration in this model.

      In the discussion, the authors discuss some of the parallel changes in gene expression detected in the mouse model and in the human tissues. Because few if any dopamine neurons are expected to remain in the SNc of the human tissues used, this sort of comparison has important conceptual limitations and these need to be clearly addressed.

      As discussed, we can sample SN DA neurons in early PD (see figure above), and in our view there is great value for such comparisons. We agree that discussion of appropriate caveats is warranted and this will be clearly addressed in the revision.

      A major limitation of the present discussion is that it does not discuss the possibility that the observed phenotypes are caused by the induction of a chronic state of depolarization block by the chronic CNO treatment. I encourage the authors to consider and discuss this hypothesis.

      As discussed above, our analyses of DA neuron firing in slices and open field testing to date do not support a prominent contribution of depolarization block with chronic CNO treatment. However, we cannot rule out this hypothesis, therefore we will include additional electrophysiology experiments and add discussion of this important consideration.  

      Also, the authors need to discuss the fact that previous work was only able to detect an increase in the firing rate of dopamine neurons after more than 95% loss of dopamine neurons. As such, the authors need to clearly discuss the relevance of the present model to PD. Are changes in firing rate a driver of neuronal loss in PD, as the authors try to make the case here, or are such changes only a secondary consequence of extensive neuronal loss (for example because a major loss of dopamine would lead to reduced D2 autoreceptor activation in the remaining neurons, and to reduced autoreceptor-mediated negative feedback on firing). This needs to be discussed.

      As discussed above, while increases in dopamine neuron activity may be compensatory after loss of neurons, the precise percentage required to induce such compensatory changes is not defined in mice and varies between paradigms, and the threshold level is not known in humans. We also reiterate that a compensatory increase in activity could still promote the degeneration of critical surviving DA neurons, whose loss underlies the substantial decline in motor function that typically occurs over the course of PD. Moreover, there are also multiple lines of evidence to suggest that changes in activity can initiate and drive dopamine neuron degeneration (Rademacher & Nakamura, Exp Neurol 2024). For example, overexpression of synuclein can increase firing in cultured dopamine neurons (Dagra et al., NPJ Parkinsons Dis 2021, PMID: 34408150) while mice expressing mutant Parkin have higher mean firing rates (Regoni et al., Cell Death Dis 2020,  PMID: 33173027). Similarly, an increased firing rate has been reported in the MitoPark mouse model of PD at a time preceding DA neuron degeneration (Good et al., FASEB J 2011, PMID: 21233488). We also acknowledge that alterations to dopamine neuron activity are likely complex in PD, and that dopamine neuron health and function can be impacted not just by simple increases in activity, but also by changes in activity patterns and regularity. We will amend our discussion to include the important caveat of changes in activity occurring as compensation, as well as further evidence of changes in activity preceding dopamine neuron death.

      There is a very large, multi-decade literature on calcium elevation and its effects on neuronal loss in many different types of neurons. The authors should discuss their findings in this context and refer to some of this previous work. In a nutshell, the observations of the present manuscript could be summarized by stating that the chronic membrane depolarization induced by the CNO treatment is likely to induce a chronic elevation of intracellular calcium and this is then likely to activate some of the well-known calcium-dependent cell death mechanisms. Whether such cell death is linked in any way to PD is not really demonstrated by the present results. The authors are encouraged to perform a thorough revision of the discussion to address all of these issues, discuss the major limitations of the present model, and refer to the broad pre-existing literature linking membrane depolarization, calcium, and neuronal loss in many neuronal cell types.

      While our model demonstrates classic excitotoxic cell death pathways, we would like to emphasize both the chronic nature of our manipulation and the progressive changes observed, with increasing degeneration seen at 1, 2, and 4 weeks of hyperactivity in an axon-first manner. This is a unique aspect of our study, in contrast to much of the previous literature which has focused on shorter timescales. Thus, while we will revise the discussion to more comprehensively acknowledge previous studies of calcium-dependent neuron cell death, we believe we have made several new contributions that are not predicted by existing literature. We have shown that this chronic manipulation is specifically toxic to nigral dopamine neurons, and the data that VTA dopamine neurons continue to be resilient even at 4 weeks is interesting and disease-relevant. We therefore do not want to use findings from other neuron types to draw assumptions about DA neurons, which are a unique and very diverse population. We acknowledge that as with all preclinical models of PD, we cannot draw definitive conclusions about PD with this data. However, we reiterate that we strongly believe that drawing connections to human disease is important, as dopamine neuron activity is very likely altered in PD and a clearer understanding of how dopamine neuron survival is impacted by activity will provide insight into the mechanisms of PD.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      The authors investigate pleiotropy in the genetic loci previously associated to a range of neuropsychiatric disorders: Alzheimer's disease, amyotrophic lateral sclerosis (ALS), frontotemporal dementia, Parkinson's disease, and schizophrenia. The local statistical fine-mapping and variant colocalisation approaches they use have the potential to uncover not only shared loci but also shared causal variants between these disorders. There is existing literature describing the pleiotropy between ALS and these other disorders but here the authors apply state of the art, local genetic correlation approaches to further refine any relationships. 

      Complex disease and GWAS is not my area of expertise but the authors managed to present their methods and results in a clear, easy to follow manner. Their results statistically support several correlations between the disorders and, for ALS and AD, a shared variant in the vicinity of the lead SNP from the original ALS GWAS. Such findings could have important implications for our understanding of the mechanisms of such disorders and eventually the possibility of managing and treating them. 

      The authors have built a useful pipeline that plugs together all the gold-standard, existing software to perform this analysis and made it openly available which is commendable. However, there is little discussion of what software is available to perform global and local correlation analysis and, if there are multiple tools available, why they consider the ones they selected to be the gold-standard. 

      There is some mention of previous findings of genetic pleiotropy between ALS and these other disorders in the introduction, and discussion of their improved ALS-AD evidence relative to previous work. However, detailed comparisons of their other correlations to what was described before for the same pairs of disorders (if any) is missing. Adding this would strengthen the impact of this paper. 

      Finally, being new to this approach I found the abstract a little confusing. Initially, the shared causal variant between ALS and AD is mentioned but immediately in the following sentence they describe how their study "suggested that disease- implicated variants in these loci often differ between traits". After reading the whole paper I understood that the ALS-AD shared variant was the exception but it may be best to restructure this part of the abstract. Additionally, in the abstract the authors state that different variants "suggests the role of distinct mechanisms across diseases despite shared loci". Is it not possible that different variants in the same regulatory region or protein-coding parts of a gene could be having the same effect and mechanism? Or does the methodology to establish that different variants are involved automatically mean that the variants are too distant for this to be possible? 

      We thank reviewer one for their considered review of this manuscript and for highlighting points that would benefit from further exploration. Itemised responses are provided below.

      (1) The reviewer noted that we did not adequately explain our choice of software for global and local genetic correlation analysis, and why we consider the techniques chosen as gold standard. We agree that the paper would benefit from clarification around this aspect of the study.

      Briefly, we firstly selected LAVA for the local genetic correlation analysis because it offers several advantages above competing software and was developed by a reputable team previously known for developing MAGMA, which is well-established in the statistical genetics field. In the manuscript (page 8), we added the following clarification: “LAVA was the most appropriate local genetic correlation approach for this study for several reasons. First, unlike SUPERGNOVA and rho-HESS, LAVA makes specific accommodations for analysis of binary traits. Second, other tools focus on bivariate correlation between traits whilst LAVA offers this alongside multivariate tests such as multiple regression and partial correlation, enabling rigorous testing of pleiotropic effects. Lastly, LAVA is shown to provide results which are less biased than those from other tools.”

      LDSC was selected for the global genetic correlation analysis because the software is well-established and likely the most widely adopted global genetic correlation tool. Reflecting its prevalence, the software is also compatible with LAVA, which adjusts for sample overlap based on the bivariate intercept estimate returned by LDSC. Since global genetic correlations were not the primary focus of this study, having been tested across several previous investigations (see response 2), we did not prioritise comparison of correlation estimates from LDSC against other available software. In the manuscript (pages 7-8) we now include the following statement: “[LDSC] was also applied to derive ‘global’ (i.e., genome-wide) genetic correlation estimates between trait pairs and estimate sample overlap from the bivariate intercept. The latter of these outputs was taken forward as an input for the local genetic correlation analysis using LAVA (see 2.2.2.2). Since global genetic correlation analysis across the traits studied here is not novel and associations reported in past studies are congruent across different tools, the compatibility between LDSC and LAVA motivated our use of LDSC for this analysis”.

      (2) The second comment was that the paper would be strengthened by contextualising our study with detail around what is previously known about associations between the studied traits. Accordingly, we have added clarifying text at the end of the introduction, stating: “although previous studies have performed global genetic correlation analyses between various combinations of these traits {references}, this is the first to compare them at a genome-wide scale using a local genetic correlation approach“. In the discussion, we link back to these studies, stating that “Through genetic correlation analysis, we replicated genome-wide correlations previously described between the studied traits {references}”.

      (3) The reviewer highlighted that the abstract as originally written may mislead or confuse the reader and we agree that clarity could be improved with some restructuring. This has now been revised and should read more logically.

      (4) They also enquired about our reasons for suggesting that the implication of distinct variants for each trait from a colocalisation analysis suggests a distinct causal mechanism. We thank them for this question as it encouraged us to reconsider how best to present the results of this analysis. To answer their question:

      It is certainly true that nearby but distinct variants can confer the same effect. In a scenario where multiple distinct variants result in the same effect and thus increase susceptibility towards two or more related phenotypes, you would expect to find evidence of association to each relevant variant in GWAS across these related traits (even if the magnitude of the associations differ). Where biological mechanisms are shared, post-GWAS finemapping analysis would be expected to yield credible sets overlapping across the traits, and likewise, colocalisation analysis should converge on a set of credible SNPs that are candidates for the shared effect. Where multiple distinct variants confer the same effect, you would expect to see separate fine-mapping credible sets for these distinct variants that colocalise pairwise between the jointly-affected traits. Generally, therefore, evidence supporting the two distinct variants hypothesis would suggest the role of two distinct mechanisms except when certain credible sets identified through fine-mapping converge on a colocalised effect.

      There is a further caveat which we also explored in response to Reviewer two: if a region includes long-spanning LD (and hence a larger number of variants are considered in the analysis), then the colocalisation analysis is more likely to favour the two distinct variants hypothesis since the probability of the variants implicated in both traits being shared decreases. It is likely that support for the two independent variants hypothesis is correct in most of the comparisons from this study that favour this conclusion. This is because, generally, the fine-mapping credible sets do not overlap across trait pairs (Figure S4) and consequently the colocalisation analysis does not find any support for the shared variant hypothesis. An exception is the analysis of PD and schizophrenia at the MAPT locus on chromosome 17. We have accordingly added the following clarification to the (page 18): “However, the colocalisation analysis will increasingly favour the two independent variants hypothesis as the number of analysed variants increases. Hence, the wide-spanning LD of this region may have obstructed identification of variants and mechanisms shared between the traits.”

      Reviewer #2 (Public Review): 

      Summary: 

      Spargo and colleagues present an analysis of the shared genetic architectures of Schizoprehnia and several late-onset neurological disorders. In contrast to many polygenic traits for which global genetic correlation estimates are substantial, global genetic correlation estimates for neurological conditions are relatively small, likely for several reasons. One is that assortative mating, which will spuriously inflate genetic correlation estimates, is likely to be less salient for late-onset conditions. Another, which the authors explore in the current manuscript, is that some loci affecting two or more conditions (i.e., pleiotropic loci) may have effects in opposite directions, or shared loci are sparse, such that the global genetic correlation signal washes out. 

      The authors apply a local genetic correlation approach that assesses the presence and direction of pleiotropy in much smaller spatial windows across the genome. Then, within regions evidencing local genetic correlations for a given trait pair, they apply fine-mapping and colocalization methods to attempt to differentiate between two scenarios: that the two traits share the same causal variant in the region or that distinct loci within the region influence the traits. Interestingly, the authors only discover one instance of the former: an SNP in the HLA region appearing to confer risk for both AD and ALS. This is in contrast to six regions with distinct causal loci, and twenty regions with no clear shared loci. 

      Finally, the authors have published their analysis pipeline such that other researchers might easily apply the same techniques to other collections of traits. 

      Strengths: 

      - All such analysis pipelines involve many decision points where there is often no clear correct option. Nonetheless, the authors clearly present their reasoning behind each such decision. <br /> - The authors have published their analytic pipeline such that future researchers might easily replicate and extend their findings. 

      Weaknesses:

      - The majority of regions display no clear candidate causal variants for the traits, whether shared or distinct. Further, despite the potential of local genetic correlation analysis to identify regions with effects in opposing directions, all of the regions for causal variants were identified for both traits evidenced positive correlations. The reasons for this aren't clear and the authors would do well to explore this in greater detail. 

      - The authors very briefly discuss how their findings differ from previous analyses because of their strict inclusion for "high-quality" variants. This might be the case, but the authors do not attempt to demonstrate this via simulation or otherwise, making it difficult to evaluate their explanation. 

      We thank Reviewer two for their appraisal of this manuscript and kind comments regarding its strengths. We will now aim to address the identified weaknesses.

      (1) The reviewer comments that we did not adequately investigate why loci with causal variants identified in both traits all had positive local genetic correlations. We agree that it would be helpful to better understand the underlying reasons. To address this issue, we have added a new supplementary figure to compare the positive and negative local genetic correlation results (see Figure S2). In the main-text we add the following clarification. ”Although both positive and negative local genetic correlations passed the FDR-adjusted significance threshold, we observed only positive local genetic correlations in loci where fine-mapping credible sets were identified for both traits in the pair. This reflects that the correlation coefficients and variant associations from the analysed GWAS studies were generally stronger in the positively correlated loci (see Figure S2).”

      (2) The reviewer rightly suggests that the manuscript would benefit from an improved explanation of the somewhat inconsistent results for the colocalisation analysis of ALS and AD at the locus around the rs9275477 SNP from this work and a previous study.  We have now further investigated this and believe that the discrepancy results partly from an inherent empirical characteristic of the colocalisation analysis. We have explained this in the manuscript (page 22) as follows: “The previous study analysed a 200Kb window of over 2,000 SNPs around the lead genome-wide significant SNP from the ALS GWAS, rs9275477, and found ~0.50 posterior probability for each of the shared and two independent variant(s) hypotheses. The current analysis used 475 SNPs occurring within a semi-independent LD block of ~50kb in this locus. Since the posterior probability of the two independent variants hypothesis (H3) increases exponentially with the number of variants in the region whilst the shared variant hypothesis (H4) scales linearly, it is expected that our analysis would give stronger support for the latter. Given that the previous study defined regions for analysis based on an arbitrary window of ±100kb around each lead genome-wide significant SNP from the ALS GWAS and we defined each analysis region based on patterns of LD in European ancestry populations, it is reasonable to favour the current finding.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Responses to recommendations

      Reviewer #1 (Recommendations For The Authors):

      Describe more precisely how gene expression graphs are built (tissues, reads counts). For example, how were read counts normalized? Were they from DESeq2 data, which only works by comparing two samples? If so, all samples should be independently compared to a reference and the normalized expression value of the reference will change from sample to sample... thus introducing a pure technical artifact.

      We have added additional information about the normalisation method to the

      Material and Methods section (Lines 597-598: “Lastly, expression levels shown in figures 2-5 are normalised gene counts produced by DESeq2.”) and figure legends

      (lines 247, 286, 372, 404: “Gene expression data was generated from whole fish.

      Expression levels were derived from DESeq2 normalised gene counts.”) to address this recommendation. 

      DESeq2 provides a reference independent normalisation through a median of ratios method (a good explanation can be found here:

      https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.h tml). The normalised expression values are independent of any reference, and therefore will not change from sample and sample as suggested in this comment. In contrast, the pairwise comparisons are done when analysing significantly differentially expressed genes between two treatments using a Wald test, which is done against a reference and generates log2 fold change information and p-values.; however, this is different to the normalisation we described above.

      Provide bioinformatics workflows and, if possible, the set of parameters used, the computing resources, etc. Were some assembly finishing steps carried out (by long-range PCR?) and experimental validations (especially for allelespecific transcripts, by conventional RT-PCR based on diagnostic mutations)?

      We have added additional information on the bioinformatics workflows where required, including parameters used (Lines 530, 536, 549-551, and 574-583.). No finishing steps other than HiC scaffolding were performed. No allele-specific analysis was done as part of this manuscript.

      To further improve transparency, we have also uploaded all the scripts used for this study to https://github.com/R-Huerlimann/Malabar_grouper_genome and the gene models and functional annotation to https://figshare.com/projects/Malabar_grouper_Epinephelus_malabaricus_genome_ annotation/199909. This information has been added to the manuscript in lines 600601 and 609-611.

      Reviewer #3 (Recommendations For The Authors):

      General author response:

      All the recommendations of this reviewer are very relevant and would certainly provide a lot of information, but they are constituting a full project in themselves as they would imply establishing this grouper species as an experimental model in our lab. Currently we only have access to the larval and juvenile stages via a collaboration with the Okinawa Prefectural Sea Farming Center, which is an hour drive from our lab, and is limited to the grouper spawning season. If we want to do all what is suggested, we need to have a regular and easy access to the fishes. This would require establishing this model in our marine station, which is not possible due to space and time issues. These groupers grow to a very large size (1-2 m in length, and up to 150 kg in weight) and only mature into males after > 6 years.

      First and foremost, I would advise the authors to extend their TH and cortisol levels measurements to the entire developmental time considered in their analysis.

      For the reasons stated above we could not perform these experiments. We must emphasize that the data regarding TH are available for a closely related species (e.g., Epinephelus coioides, de Jesus et al. 1998) and there is no reason to think that the situation will be drastically different in E. malabaricus. In addition, given that we have now studied several coral reef fish species in the same context (clownfish, surgeonfish, damselfish, gobies) we observed that the transcriptomic data are more robust, more sensitive, and more precise than hormone measurements. 

      Consider carrying out in situ hybridisation of TSH with putative CRH receptors to determine if thyrotrophin could be competent to respond to HPA axis signals.

      We agree studying the interplay between corticoids and thyroid hormones at the neuroendocrine level would be desirable and we fully agree with the experiment suggested by the reviewer, but this is impossible in our current situation. We are not working with an establish animal model like zebrafish or Xenopus, but with a large, long-lived marine fish that reproduces in spawning aggregations and whose husbandry is notoriously difficult.

      Consider conducting cortisol treatment experiments to functionally determine if indeed cortisol is involved in grouper metamorphosis.

      We tried to do TH and cortisol treatments specifically on the early larval stages corresponding to the early TH peak to see how this would impact the development of the fin spines, but our trials were unsuccessful. The larvae at that stage are extremely fragile and even putting them into small volumes of treatment drugs induced massive mortalities. Again, this would mean establishing this grouper species as a model organism and would require a massive effort to improve larval rearing as discussed above. We feel that our data stands on its own in the meantime and adds valuable information to the existing literature by studying a rarely investigated species.

      Responses to comments

      Reviewer #1 (Public Review):

      Weaknesses:

      The manuscript needs proper editing and is not complete. Some wordings lack precision and make it difficult to follow (e.g. line 98 "we assembled a chromosome-scale genome of ..." should read instead "we assembled a chromsome-scla genome sequence of ...". Also, panel Figure 2E is missing.

      We made the suggested change of adding “sequence” in lines 32 and 121. Concerning additional changes, we have carefully edited our manuscript and looked for any incomplete sections. Unfortunately, it is difficult to see what other issues are being raised here without any further information. 

      As for panel E of figure 2, it is not missing. The panel is located to the right, just below “Target Cells”.

      The shortcomings of the manuscripts are not limited to the writing style, and important technical and technological information is missing or not clear enough, thereby preventing a proper evaluation of the resolution of the genomic resources provided:

      Several RNASeq libraries from different tissues have been built to help annotate the genome and identify transcribed regions. This is fine. But all along the manuscript, gene expression changes are summarized into a single panel where it is not clear at all which tissue this comes from (whole embryo or a specific tissue ?), or whether it is a cumulative expression level computed across several tissues (and how it was computed) etc. This is essential information needed for data interpretation.

      No fertilised eggs or embryos have been sequenced. The individual tissues derived from juvenile fish were used for the genome annotation only, using ISOseq. The whole larval fish were used for the developmental analysis using RNAseq, as well as the genome annotation. We have added additional information in the figures and text that the results shown are from whole larvae, and added more detail to the material and methods section about which type of sample was analysed in which way.

      Specifically, we have added “Lastly, expression levels shown in figures 2-5 are normalised gene counts produced by DESeq2.” to lines 597-598 in the Material and Methods section, “Gene expression data was generated from whole larvae.” to line 191, and “Gene expression data was generated from whole fish. Expression levels were derived from DESeq2 normalised gene counts.” to the figure legends in lines 247, 286, 372, 404). Additionally, we have added clarifications in lines 489, 497, 530, and 536. 

      The bioinformatic processing, especially of the assemble and annotation, is very poorly described. This is also a sensitive topic, as illustrated by the numerous "assemblathon" and "annotathon" initiatives to evaluate tools and workflows. Importantly, providing configuration files and in-depth description of workflows and parameter settings is highly recommended. This can be made available through data store services and documents even benefit from DOIs. This provides others with more information to evaluate the resolution of this work. No doubt that it is well done,but especially in the field of genome assembly and annotation, high resolution is VERY cost and time-intensive. Not surprisingly, most projects are conditioned by trade-offs between cost, time, and labor. The authors should provide others with the information needed to evaluate this.

      We have added additional information on parameters used in the genome assembly, annotation and transcriptome analysis in lines 549-551, 577, 579, 580, and 582. Additionally, we have uploaded all scripts to github as outlined in the Code and Data Availability section (lines 599-614).

      The genome assembly did not use a specific workflow (e.g., nextflow), but was done with a simple command and standard parameters in IPA. Scaffolding was carried out by Phase Genomics using their standardised proprietary workflow, of which a detailed description provided by Phase Genomics can be found in the supplementary material.

      Quantifications of T3 and T4 levels look fairly low and not so convincing. The work would clearly benefit from a discussion about why the signal is so low and what are the current technological limitations of these quantifications.

      This would really help (general) readers.

      The T3/T4 levels are consistent with other published work in fish. In the present manuscript for grouper we have a peak level of 1.2 ng/g (1,200 pg/g) of T4 and 0.06 ng/g (60 pg/g) of T3. This is a higher level of T4 and comparable level of T3 to what was found in convict tang (Holzer et al. 2017; Figure 2) with 30 pg/g of T4 and 100 pg/g of T3. Of course, there are also examples with higher levels, such as clownfish (Roux et al. 2023; Figure 1), with 10 ng/g (10,000 pg/g) of T4 and 2 ng/g (2,000 pg/g) of T3.

      The differences could be due to different structure of fish tissues and therefore different hormone extraction efficiency, different hormone measurement protocols, different fish physiology, different fish size (e.g., the weighting of tiny grouper larvae is difficult and less precise than in convict tang). What is important is not the absolute level but the relative level, which shows the change within different larval stages of a species with identical extraction and measurement protocols. Which means our data is internally consistent and coherent with what the grouper literature says.

      Holzer, Guillaume, et al. "Fish larval recruitment to reefs is a thyroid hormonemediated metamorphosis sensitive to the pesticide chlorpyrifos." Elife 6 (2017): e27595.

      Roux, Natacha, et al. "The multi-level regulation of clownfish metamorphosis by thyroid hormones." Cell Reports 42.7 (2023).

      Differential analysis highlights up to ~ 15,000 differentially expressed genes (DEG), out of a predicted 26k genes. This corresponds to more than half of all genes. ANOVA-based differential analysis relies on the simple fact that only a minority of genes are DEG. Having >50% DEG is well beyond the validity of the method. This should be addressed, or at least discussed.

      The large number of differentially expressed genes is due to the fact that this is coming from a larval developmental transcriptome going from one day old larva to fully metamorphosed juveniles at around day 60. 

      While DESeq2 indeed works on an assumption that most genes are not differentially expressed, this affects normalization but not hypothesis testing (Wald-test, LRT tests or ANOVA). However, normalisation in DESeq2 is fairly robust to this assumption. According to the author of DESeq2, Micheal Love, DESeq2 is using the median ratio for normalisation, and as long as the number of up and down regulated genes is relatively even, DESeq2 will be able to handle the data. As part of our general quality control for this project we consulted the MA plots, which do not show any overrepresented up or down expression patterns. Additionally see Michael Love comment on comparing different tissues, which is also applicable here when comparing vastly different larval stages (https://support.bioconductor.org/p/63630/):

      “For experiments where all genes increase in expression across conditions, the median ratio method will not be able to capture this difference, but this is typically not the case for a tissue comparison, as there are many "housekeeping" genes with relatively similar expression pattern across tissues.”

      Reviewer #3 (Public Review):

      Weaknesses:

      However, the authors make substantial considerations that are not proven by experimental or functional data. In fact, this is a descriptive study that does not provide any functional evidence to support the claims made.

      We agree with the reviewer that our paper lacks functional experiments but despite that, the transcriptomic data clearly show the activation of TH and corticoid pathways during two distinct periods: an early activation between D1 and D10, and a second one between D32 and juvenile stage. These data are interesting as they call for further examination of 1) the existence of an early larval developmental step also involving TH and corticosteroids and 2) the possible interaction of corticoids and TH during metamorphosis. This is a question that is certainly not settled yet in teleost fishes and which is of great interest.

      Especially 1) is of interest and importance, since this early activation (unique to our knowledge in any teleost fish studied so far) raises a lot of new questions and once again will certainly be scrutinised by other groups in the years to come, therefore ensuring a good citation impact of this study. We hope that the reviewer, while disagreeing with some our statements, will recognize that our study will be stimulating at that level and that this is what scientific studies should do.

      We acknowledge the descriptive nature of the data and the lack of functional experiments in the Discussion in lines 443 to 445: “This may suggest that in some aspect, cortisol synthesis could work in concert with TH, as has been shown in several different contexts in amphibians, but functional experiments need to be conducted to confirm this hypothesis.” As stated above doing such functional experiment would require establishing the grouper as an experimental model in our husbandry, which currently is not possible due to the large size of the adult fish.

      The consideration that cortisol is involved in metamorphosis in teleosts has never been shown, and the only example cited by the authors (REF 20) clearly states that cortisol alone does not induce flatfish metamorphosis. In that work, the authors clearly state that in vivo cortisol treatment had no synergistic effect with TH in inducing metamorphosis. Moreover, in Senegalensis, the sole pre-otic CRH neuron number decreases during metamorphosis, further arguing that, at least in flatfish, cortisol is not involved in flatfish metamorphosis (PMID: 25575457).  

      We will do our best to improve the clarity of the revised manuscript to avoid any misunderstanding about our claims. However, we would like to point out the semantic shift in the reviewer first sentence: Indeed “being involved” is not the same as “cortisol alone does not induce”. In ref 20 the authors explicitly wrote that “Cortisol further enhanced the effects of both T4 and T3, but was ineffective in the absence of thyroid hormones” and in our view this indeed corresponds to ”being involved in metamorphosis”.

      We are not claiming that cortisol alone is involved in metamorphosis as the reviewer suggests, but simply that there is a possible involvement of cortisol together with TH in metamorphosis. We stand on this claim as we indeed observed an activation of corticoid pathway genes around D32, which is sufficient to say it is involved. We do agree that functional experiments will be needed to properly demonstrate the involvement of corticoids in grouper metamorphosis, but this was not possible in the current study as it would imply to set up a full grouper life cycle in lab conditions which is impossible for the scope of this manuscript.

      We also mentioned in the discussion that the role of corticoids in fish larval development is still debated, and we agree that this remains a contentious issue. We have clarified the Discussion on this point (lines 375-376, lines 439-464).

      We wrote that “There is contrasting evidence of communication between these two pathways during teleost fish larval development with some data suggesting a synergic and other an antagonistic relationship. In terms of synergy, an increase in cortisol level concomitantly with an increase in TH levels has been observed in flatfish [26], golden sea bream [64] and silver sea bream [65]. Cortisol was also shown to enhance in vitro the action of TH on fin ray resorption (phenomenon occurring during flatfish metamorphosis) in flounder[27]. It has also been shown that cortisol regulates local T3 bioavailability in the juvenile sole via regulation of deiodinase 2 in an organ-specific manner [66]. On the antagonistic side, it has been shown that experimentally induced hyperthyroidism in common carp decreases cortisol levels[67], whereas cortisol exposure decreases TH levels in European eel [68]. Given this scattered evidence, the existence of a crosstalk active during teleost larval development and metamorphosis has never been formally demonstrated. The results we obtained in grouper are clearly indicating that HPI axis is activated during both early development and metamorphosis and that cortisol synthesis is activated during early development. This may suggest that in some aspect, cortisol synthesis could work in concert with TH, as has been shown in several different contexts in amphibians [25], but functional experiments need to be conducted to confirm this hypothesis.” In the revised manuscript, we have also added the interesting case of the Senegal sole mentioned by the reviewer.

      In the last revision, we had also added that our results “brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy” meaning that we clearly acknowledge that we are only revealing a hypothesis that remains to be tested. We later follow up with a discussion about the most novel observation and focus of our study, the increase in THs and cortisol during early development, which was unexpected and very intriguing. Again, these results suggest that there might be a link between the two, as has been shown in amphibians. This is typically the kind of results that should encourage more investigations into other fish species. Indeed, this has been pointed out by other authors and in particular by Bob Denver (probably the foremost expert on this topic) in Crespi and Denver 2012: “Elevation in HPA/I axis activity has been described prior to Metamorphosis in amphibians and fish, birth in mammals (reviewed in Crespi & Denver 2005a; Wada 2008)”. B. Denver also adds that: “Experiments in which GCs were elevated prior to metamorphosis or prior to hatching or birth (e.g. Weiss, Johnston & Moore 2007) or inhibited by treatments with GC synthesis blockers (e.g. metyrapone) or receptor antagonists (e.g. RU486, Glennemeir & Denver 2002) demonstrate that GCs play a causal role in precipitating these life-history transitions (also reviewed in Crespi & Denver 2005a; Wada 2008).” We believe the reviewer will be convinced by these elements coming from a colleague unanimously respected in the field. 

      Furthermore, the authors need to recognise that the transcriptomic analysis is whole-body and that HPA axis genes are upregulated, which does not mean they are involved in regulating the HPT axis. The authors do not show that in thyrotrophs, any CRH receptor is expressed or in any other HPT axis-relevant cells and that changes in these genes correlate with changes in TSH expression. An in-situ hybridisation experiment showing co-expression on thyrotrophs of HPA genes and TSH could be a good start. However, the best scenario would be conducting cortisol treatment experiments to see if this hormone affects grouper metamorphosis.

      We agree that functional experiments are needed to validate our hypothesis. As the early peaks of expression levels observed for many genes were very intriguing for us, we did carry out thyroid hormones and goitrogenic treatment on young grouper larvae to test their effect on the morphological changes. Unfortunately, such experiments, already tricky on metamorphosing larvae, are even more risky on such tiny individuals just after hatching and we encountered high mortality rates. We must add that because we cannot establish a full grouper life cycle under lab conditions, we have done these experiments in the context of a commercial husbandry system in Japan, which while excellent limits the scope of possible experiments. We were thus not able to provide functional validation of our hypothesis. Such experiments will be a full project in itself, requiring setting up a rearing system suitable for both larval survival and economical constraints related to drug treatments. We were further limited by the spawning times of the grouper in the operational aquaculture farm, which are limited to a short time during each year. So even if we strongly agree with the necessity of conducting such experiments, we think that this is not in the scope of the present paper, but something future research can explore.

      High TSH and Tg levels usually parallel whole-body TH levels during teleost metamorphosis. However, in this study, high Tg expression levels are only achieved at the juvenile stage, whereas high TSH is achieved at D32, and at the juvenile stage, they are already at their lowest levels.

      This is exactly our point. We observe two peaks in TSH expression, one at D3 and one at D32. The peak at D3 coincides with high thyroid hormone levels on the same day, and while we have not measured TH at D32, existing literature shows that there is a peak in TH during that time (e.g., de Jesus et al., 1998). Similarly, there is a small peak of Tg at D3. Our manuscript focused more on the upregulation of these genes at D3, which has not been reported before in the literature and raised the question of the role of TH so early in the larval development, outside of the metamorphosis period. 

      Regarding the respective levels of TSH and Tg, we first would like to add that their respective order of appearance before metamorphosis (TSH at D32, Tg after) is consistent with what we would expect. We agree however that the strong increase of Tg and TPO expression is later than expected. Therefore, we have added the following sentence in lines 212 to 216: “The respective order of appearance of TSH and Tg (TSH at D32, Tg after) is consistent with what we would expect but a bit later than expected given the morphologicl transformation. It would be interesting to revisit this in a future series of experiments, with tighter temporal sampling to study how gene expression and morphological transformation aligned.“.

      It is very difficult to conclude anything with the TH and cortisol levels measurements. The authors only measured up until D10, whereas they argue that metamorphosis occurs at D32. In this way, these measurements could be more helpful if they focus on the correct developmental time. The data is irrelevant to their hypothesis.

      We respectfully disagree with the reviewer, considering that 1) TH levels have already been investigated in groupers coinciding with pigmentation changes and fin rays resorption (Figure 4 in de Jesus et al, 1998), 2) there is also evidence in numerous fish species that TH level increase is concomitant with increase of TH related genes, and 3) we observed in our data an increase in the expression of TH related genes as well as pigmentation changes and fin rays resorption. Based on our experience in fish metamorphosis and the literature we can say confidently that those observations indicate that metamorphosis is occurring between D32 and the juvenile stage. This clearly shows that our inference is correct. Additionally, we would like to reemphasize that from our experience in several fish species transcriptomic data are more robust and precise than hormone measurements.

      However, as we were surprised by the activation of TH and corticoid pathway genes very early in the larval development (at D3), which is clearly outside of the metamorphosis period, we decided to measure TH and cortisol levels during this period of time to determine if whether or not there this surprising early activation was indeed corresponding to an increase in both TH and cortisol. As such observation has never been made in other teleost species (to our knowledge), and as we were wondering if gene activation was accompanied by hormonal increase, the measurements we did for TH and cortisol between D1 and D10 are relevant. In order to clarify our message further, we have changed some of the mentions of

      “metamorphosis” to “larval development” throughout the manuscript and added other improvements to avoid any confusion between the two periods we are studying: early larval development (between D1 and D10) and metamorphosis (between D32 and juvenile stage).  

      Moreover, as stated in the previous review, a classical sign of teleost metamorphosis is the upregulation of TSHb and Tg, which does not occur at D32 therefore, it is very hard for me to accept that this is the metamorphic stage. With the lack of TH measurements, I cannot agree with the authors. I think this has to be toned down and made clear in the manuscript that D32 might be a putative metamorphic climax but that several aspects of biology work against it. Moreover, in D10, the authors show the highest cortisol level and lowest T4 and T3 levels. These observations are irreconcilable, with cortisol enhancing or participating in TH-driven metamorphosis.

      We thank the reviewer for this comment, but we think that there might be a misunderstanding here. 

      (1) We clearly observed an increase of TSHb (that occurs between D18 and juvenile stage) and an increase of tg from D32 which coincide with the activation of other genes involved in TH pathway (dio2, dio3, and also a strong increase of TRb). All this and put in the context of what we know from previous grouper studies, clearly supports our conclusion that TH-regulated metamorphosis is starting at around D32 in grouper. We also observed morphological changes such as fin rays resorption and pigmentation changes between D32 and juvenile stage. Such morphological changes have already been associated as corresponding to metamorphosis in groupers (De Jesus et al 1998) as they occur during TH level increase, and they also happen to be under the control of TH in grouper (De Jesus et al 1998). Based on this study but also on studies (conducted on many other teleost species) showing that the increase of TH levels is always associated with an activation of TH pathway genes and morphological and pigmentation changes we concluded that metamorphosis of E. malabaricus occurs between D32 and juvenile stage. We have improved the clarity of the manuscript in several places to make sure that our conclusion is based on our transcriptomic and morphological data plus the available literature.

      (2) We clearly observed another activation of TH related gene earlier in the development (between D1 and D10, with a surge of trhrs, tg and tpo at D3. As this activation was very unexpected for us, we decided to focus the analysis of TH levels between D1 and D10 and very interestingly we observed high level of T4 at D3 indicating that THs are instrumental very precociously in the larval development of the malabar grouper which has never been shown before. We declared lines 224-225 that our “data reinforce the existence of two distinct periods of TH signalling activity, one early on at D3 and one late corresponding to classic metamorphosis at D32”. However, we agree that we could have been clearer and clearly explained that this early activation was very intriguing for us and that we wanted to investigate hormonal levels around that period. However, we never claimed anywhere in the manuscript

      that this early developmental period corresponds to metamorphosis. Something else is occurring and both TH and cortisol seem to be involved but further experiments need to be conducted to understand their role and their possible interaction. We have added corresponding statements in the abstract (lines 39-43) and discussion (lines 447 to 449).

      (3) Finally, regarding the comment about cortisol enhancing or participating in TH driven metamorphosis, our data clearly showed an activation of the corticoid pathway genes around metamorphosis (between D32 and juvenile stage) suggesting a potential implication of corticoids in metamorphosis, but we agree with the reviewer that further experiment are needed to test that. We never claimed that cortisol was enhancing or participating in metamorphosis, on the contrary we are “suggesting a possible interaction between TH and corticoid pathway during metamorphosis”. And we also say that our “results brought a first insight into the potential role of corticoids in the metamorphosis of E. malabaricus and call for functional experiments directly testing a possible synergy.” Nonetheless, we agree that some parts of our manuscript can be confusing in regards of cortisol synthesis during metamorphosis as we did not measure cortisol levels between D32 and juvenile stage. We have therefore made changes throughout the Introduction and Discussion to make this clearer.

      Given this, the authors should quantify whole-body TH levels throughout the entire developmental window considered to determine where the peak is observed and how it correlates with the other hormonal genes/systems in the analysis.

      We did not measure TH levels at later stages as it has already been measured during Epinephelus coioides metamorphosis and the morphological changes observed in this species around the TH peak corresponds to what we observed in Epinephelus malabaricus around the peak of expression of TH pathway genes (see De Jesus et al., 1998 General and Comparative Endocrinology, 112:10-16). The main focus of this manuscript is the novel observation of the existence of an early activation period observed at D3, and for which we needed TH levels to determine if they were involved in another early developmental process (not related to metamorphosis). Our hypothesis is that this early activation might be related to the growth of fin rays necessary to enhance floatability during the oceanic larval dispersal. As we may have arrived at the explanation of this hypothesis too rapidly without setting up the context well enough, we have made changes to the introduction and discussion.

      Even though this is a solid technical paper and the data obtained is excellent, the conclusions drawn by the authors are not supported by their data, and at least hormonal levels should be present in parallel to the transcriptomic data. Furthermore, toning down some affirmations or even considering the different hypotheses available that are different from the ones suggested would be very positive.

      We thank the reviewer for acknowledging the solidity of the method of our paper and the quality of the results. We agree that there were several parts where our message was unclear. We have addressed these points in the revised version of the manuscript to make sure there is no more confusion between the two distinct periods we studied in this paper (early larval development and metamorphosis). We also made sure that our claims about TH/corticoids interaction during both periods remain hypothetical as we cannot yet, despite trials, sustain them with functional experiment.

    1. Author response:

      eLife assessment

      This study offers a useful treatment of how the population of excitatory and inhibitory neurons integrates principles of energy efficiency in their coding strategies. The analysis provides a comprehensive characterisation of the model, highlighting the structured connectivity between excitatory and inhibitory neurons. However, the manuscript provides an incomplete motivation for parameter choices. Furthermore, the work is insufficiently contextualized within the literature, and some of the findings appear overlapping and incremental given previous work.

      We thank the Reviewers and the Reviewing Editor for taking time to provide extremely valuable suggestions and comments, which will help us to substantially improve our paper. In what follows we summarize our current plan to improve the paper taking up on their suggestions.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, leading to the experimentally observed phenomenon of feature competition. They also characterise the impact of various (hyper)parameters, such as adaptation timescale, ratio of excitatory to inhibitory cells, regularisation strength, and background current. These results add useful biological realism to a particular model of efficient coding. However, not all claims seem fully supported by the evidence. Specifically, several biological features, such as the ratio of excitatory to inhibitory neurons, which the authors claim to explain through efficient coding, might be contingent on arbitrary modelling choices. In addition, earlier work has already established the importance of structured connectivity for feature competition. A clearer presentation of modelling choices, limitations, and prior work could improve the manuscript.

      Thanks for these insights and for this summary of our work.

      Major comments:

      (1) Much is made of the 4:1 ratio between excitatory and inhibitory neurons, which the authors claim to explain through efficient coding. I see two issues with this conclusion: (i) The 4:1 ratio is specific to rodents; humans have an approximate 2:1 ratio (see Fang & Xia et al., Science 2022 and references therein); (ii) the optimal ratio in the model depends on a seemingly arbitrary choice of hyperparameters, particularly the weighting of encoding error versus metabolic cost. This second concern applies to several other results, including the strength of inhibitory versus excitatory synapses. While the model can, therefore, be made consistent with biological data, this requires auxiliary assumptions.

      We will describe better the ratio of numbers of E and I neurons found in real data, as suggested. The first submission already contained an analysis of how this ratio of neuron numbers depends on the weighting of the loss of E and I neurons and on the relative weighting of the encoding error vs the metabolic cost in the loss function (see Fig 6E). We will make sure that these results are suitably expanded and better emphasized in revision. We will also include new analysis of dependence of optimal parameters on the relative weighting of encoding error vs metabolic cost in the loss function when studying other parameters (namely: noise intensity, metabolic constant, ratio of mean I-I to E-I connectivity, time constants of single E and I neurons).

      (2) A growing body of evidence supports the importance of structured E-I and I-E connectivity for feature selectivity and response to perturbations. For example, this is a major conclusion from the Oldenburg paper (reference 62 in the manuscript), which includes extensive modelling work. Similar conclusions can be found in work from Znamenskiy and colleagues (experiments and spiking network model; bioRxiv 2018, Neuron 2023 (ref. 82)), Sadeh & Clopath (rate network; eLife, 2020), and Mackwood et al. (rate network with plasticity; eLife, 2021). The current manuscript adds to this evidence by showing that (a particular implementation of) efficient coding in spiking networks leads to structured connectivity. The fact that this structured connectivity then explains perturbation responses is, in the light of earlier findings, not new.

      We agree that the main contribution of our manuscript in this respect is to show how efficient coding in spiking networks can lead to structured connectivity similar to those proposed in the above papers. We apologize if this was not clear enough in the previous version. We will make it clearer in revision.  We nevertheless think it useful to report the effects of perturbations within this network because the structure derived in our network is not identical to those studied in the above paper, and because these results give information about how lateral inhibition works in this network. Thus, we will keep presenting it in the revised version, although we will de-emphasize and simplify its presentation to give more emphasis to the novelty of the derivation of this connectivity rule from the principles of efficient coding.

      (3) The model's limitations are hard to discern, being relegated to the manuscript's last and rather equivocal paragraph. For instance, the lack of recurrent excitation, crucial in neural dynamics and computation, likely influences the results: neuronal time constants must be as large as the target readout (Figure 4), presumably because the network cannot integrate the signal without recurrent excitation. However, this and other results are not presented in tandem with relevant caveats.

      We will improve the Limitations paragraph in Discussion, and also anticipate caveats in tandem with results when needed, as suggested.

      (4) On repeated occasions, results from the model are referred to as predictions claimed to match the data. A prediction is a statement about what will happen in the future - but most of the "predictions" from the model are actually findings that broadly match earlier experimental results, making them "postdictions".

      This distinction is important: compared to postdictions, predictions are a much stronger test because they are falsifiable. This is especially relevant given (my impression) that key parameters of the model were tweaked to match the data.

      We will better distinguish between pre- and post-dictions  in revision.

      Reviewer #2 (Public Review):

      Summary: In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength. It thus argues that some of these observations may come as a direct consequence of efficient coding.

      Strengths:

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models.

      In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important.

      Though several of the observations have been reported and studied before (see below), this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Thanks for these insights and for the kind words of appreciation of the strengths of our work.

      Weaknesses:

      Though the text of the paper may suggest otherwise, many of the modeling choices and observations found in the paper have been introduced in previous work on efficient spiking models, thereby making this work somewhat repetitive and incremental at times. This includes the derivation of the network into separate excitatory and inhibitory populations, discussion of physical units, comparison of voltage versus spike-timing correlations, and instantaneous E/I balance, all of which can be found in one of the first efficient spiking network papers (Boerlin et al. 2013), as well as in subsequent papers. Metabolic cost and slow adaptation currents were also presented in a previous study (Gutierrez & Deneve 2019). Though it is perfectly fine and reasonable to build upon these previous studies, the language of the text gives them insufficient credit.

      We will improve the text to make sure that credit to previous studies is more precisely and more clearly given.

      Furthermore, the paper makes several claims of optimality that are not convincing enough, as they are only verified by a limited parameter sweep of single parameters at a time, are unintuitive and may be in conflict with previous findings of efficient spiking networks. This includes the following. Coding error (RMSE) has a minimum at intermediate metabolic cost (Figure 5B), despite the fact that intuitively, zero metabolic cost would indicate that the network is solely minimizing coding error and that previous work has suggested that additional costs bias the output. Coding error also appears to have a minimum at intermediate values of the ratio of E to I neurons (effectively the number of I neurons) and the number of encoded variables (Figures 6D, 7B). These both have to do with the redundancy in the network (number of neurons for each encoded variable), and previous work suggests that networks can code for arbitrary numbers of variables provided the redundancy is high enough (e.g., Calaim et al. 2022). Lastly, the performance of the E-I variant of the network is shown to be better than that of a single cell type (1CT: Figure 7C, D). Given that the E-I network is performing a similar computation as to the 1CT model but with more neurons (i.e., instead of an E neuron directly providing lateral inhibition to its neighbor, it goes through an interneuron), this is unintuitive and again not supported by previous work. These may be valid emergent properties of the E-I spiking network derived here, but their presentation and description are not sufficient to determine this.

      We are addressing this issue in two ways. First, we will present results of joint sweeps of variations of pairs of parameters whose joint variations are expected to influence optimality in a way that cannot be understood varying one parameter at a time. Namely we plan to vary jointly the noise intensity and the metabolic constant, as well as the ratio of E to I neuron numbers and the ratio of mean I-I to E-I connectivity. Second, we will individuate a reasonable/realistic range of possible variations of each individual parameter and then perform a Monte Carlo search for the optimal point within this range, and compare the so-obtained results with those obtained from the understanding gained from varying one or two parameters at a time.  We will also add the suggested citation to Calaim et al. 2022 in regard to the points discussed above.

      We will improve the comparison between the Excitatory-Inhibitory and the 1-Cell-Type model (see reply to the suggestions of Referee 3 for more details).

      Alternatively, the methodology of the model suggests that ad hoc modeling choices may be playing a role. For example, an arbitrary weighting of coding error and metabolic cost of 0.7 to 0.3, respectively, is chosen without mention of how this affects the results. Furthermore, the scaling of synaptic weights appears to be controlled separately for each connection type in the network (Table 1), despite the fact that some of these quantities are likely linked in the optimal network derivation. Finally, the optimal threshold and metabolic constants are an order of magnitude larger than the synaptic weights (Table 1). All of these considerations suggest one of the following two possibilities. One, the model has a substantial number of unconstrained parameters to tune, in which case more parameter sweeps would be necessary to definitively make claims of optimality. Or two, parameters are being decoupled from those constrained by the optimal derivation, and the optima simply corresponds to the values that should come out of the derivation.

      In the previously submitted manuscript we presented both the encoding error and the metabolic cost separately as a function of the parameters, so that readers could get an understanding of how stable optimal parameters would be to the change of the relative weighting of encoding error and metabolic cost. We will improve this work by adding the suggested calculations to provide quantitative measures of the dependence of the optimal network parameters and configurations on this relative weighting.

      Reviewer #3 (Public Review):

      Summary: In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work?

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs.

      They then investigate in-depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and showing the networks can operate in a biologically realistic regime.

      Strengths:

      (1) The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field.

      (2) They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly.

      (3) They put sensible constraints on their networks, while still maintaining the good properties these networks should have.

      Thanks for this summary and for these kind words of appreciation of the strengths of our work.

      Weaknesses:

      (1) The paper has somewhat overstated the significance of their theoretical contributions, and should make much clearer what aspects of the derivations are novel. Large parts were done in very similar ways in previous papers. Specifically: the split into E and I neurons was also done in Boerlin et al (2008) and in Barrett et al (2016). Defining the networks in terms of realistic units was already done by Boerlin et al (2008). It would also be worth it to discuss Barrett et al (2016) specifically more, as there they also use split E/I networks and perform biologically relevant experiments.

      We will improve the text to make sure that credit to previous studies is more precisely and more clearly given.

      (2) It is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. While the constraints of Dale's law are sensible (splitting the population in E and I neurons, and removing any non-Dalian connection), they are imposed from biology and not from any coding principles. A discussion of how this could be done would be much appreciated, and in the main text, this should be made clear.

      We indeed removed non-Dalian connections because having only connections respecting Dale’s law is a major constraint for biological plausibility. Our logic was to consider efficient coding within the space of networks that satisfy this (and other) biological plausibility constraints. We did not intend to claim that removing the non-Dalian connections was the result of an analytical optimization. However, to get better insights into how Dale’s Law constrains or influences the design of efficient networks, we added a comparison of the coding properties of networks that either do or do not satisfy Dale’s law. We apologize if this was not sufficiently clear in the previous version and we will clarify this in revision. 

      (3) Related to the previous point, the claim that the network with split E and I neurons has a lower average loss than a 1 cell-type (1-CT) network seems incorrect to me. Only the E population coding error should be compared to the 1-CT network loss, or the sum of the E and I populations (not their average). In my author recommendations, I go more in-depth on this point.

      We will perform the suggested detailed comparisons between the network loss in the 1CT-model and E-I model and then revise or refine conclusions if and as needed, according to the results we will obtain.

      (4) While the paper is supposed to bring the balanced spiking networks they consider in a more experimentally relevant context, for experimental audiences I don't think it is easy to follow how the model works, and I recommend reworking both the main text and methods to improve on that aspect.

      We will try to make the presentation of the model more accessible to a non-computational audience.

      Assessment and context: Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporating aspects of energy efficiency. For computational neuroscientists, this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers, the model provides a clearer link between efficient coding spiking networks to known experimental constraints and provides a few predictions.

      Thanks for these kind words. We will make sure that these points emerge more clearly and in a more accessible way from the revised paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Overall, this study provides a meticulous comparison of developmental transcriptomes between two sub-species of the annelid Streblospio benedicti. Different lineages of S. benedicti maintain one of two genetically programmed alternative life histories, the ancestral planktotrophic or derived lecithotrophic forms of development. This contrast is also seen at the inter-species level in many marine invertebrate taxa, such as echinoderms and molluscs. The authors report relatively (surprisingly?) modest differences in transcriptomes overall but also find some genes whose expression is essentially morph-specific (which they term "exclusive").

      Strengths:

      The study is based on a dense and appropriately replicated sampling of early development. The tight clustering of each stage/morph combination in PCA space suggests the specimens were accurately categorized. The similar overall trajectories of the two morphs were surprising to me for two stages: 1) the earliest stage (16-cell), at which we might expect maternal differences due to the several-fold difference in zygote size, and 2) the latest stage (1-week), where there appears to be the most obvious morphological difference. This is why we need to do experiments!

      The examination of F1 hybrids was another major strength of the study. It also produced one of the most surprising results: though intermediate in phenotype, F1 embryos have the most distinct transcriptomes, and reveal a range of fixed, compensatory differences in the parental lines.

      Weaknesses:

      Overall I really enjoyed this paper, but I see a few places where it can be tightened and made more insightful. These relate to better defining the basis for "exclusive" expression (regulation or gene presence/absence?), providing more examples of how specific genes related to trophic mode behave, and placing the study in the context of similar work in other phyla.

      As suggested, we changed the term “exclusive expression” to “morph-specific” expression throughout the paper to clarify which genes are only expressed in one morph. We also added references to similar work in other phyla such as recent work on lecithotrophic and planktotrophic development in species of Heliocidaris sea urchins in the 4th paragraph of the discussion. We added additional data about the F1 hybrids in “Gene expression of Genetic Crosses” section and the new Figure 8B. We find that gene expression in F1 offspring is divided between matching the maternal and paternal gene expression patterns, with slightly more genes matching paternal expression.

      Reviewer #2 (Public Review):

      The manuscript by Harry and Zakas determined the extent to which gene expression differences contribute to developmental divergence by using a model that has two distinct developmental morphs within a single species. Although the authors did collect a valuable dataset and trends in differential expression between the two morphs of S. benedicti were presented, we found limitations about the methods, system, and resources that the authors should address.

      We have two major points:

      (1) Background information about the biological system needs to be clarified in the introduction of this manuscript. The authors stated that F1 offspring can have intermediate larval traits compared to the parents (Line 81). However, the authors collected F1 offspring at the same time as the mother in the cross. If offspring have intermediate larval traits, their developmental timeline might be different than both parents and necessitate the collection of offspring at different times to obtain the same stages as the parents. Could the authors (1) explain why they collected offspring at the same time as parents given that other literature and Line 81 state these F1 offspring develop at intermediate rates, and (2) add the F1 offspring to Figure 1 to show morphological and timeline differences in development?

      Additionally, the authors state (Lines 83-85) that they detail the full-time course of embryogenesis for both the parents and the F1 crosses. However, we do not see where the authors have reported the full-time course for embryogenesis of the F1 offspring. Providing this information would shape the remaining results of the manuscript.

      (2) We have several concerns about the S. benedicti genome and steps regarding the read mapping for RNA-seq:

      The S. benedicti genome used (Zakas et al. 2022) was generated using the PP morph. The largest scaffolds of this assembly correspond to linkage groups, showing the quality of this genome. The authors should point out in the Methods and/or Results sections that the quality of this genome means that PP-specific gene expression can be quantified well. However, the challenges and limitations of mapping LL-specific expression data to the PP genome should be discussed.

      It is possible that the authors did not find exclusive gene expression in the LL morph because they require at least one gene to be turned on in one morph as part of the data-cleaning criteria. Because the authors are comparing all genes to the PP morph, they could be missing true exclusive genes responsible for the biological differences between the two morphs. Did they make the decision to only count genes expressed in one stage of the other morph because the gene models and mapping quality led to too much noise?

      The authors state that the mapping rates between the two morphs are comparable (Supplementary Figure 1). However, there is a lot of variation in mapping the LL individuals (~20% to 43%) compared to the PP individuals. What is the level of differentiation within the two morphs in the species (pi and theta)? The statistical tests for this comparison should be added and the associated p-value should be reported. The statistical test used to compare mapping rates between the two morphs may be inappropriate. The authors used Salmon for their RNA alignment and differential expression analysis, but it is possible that a different method would be more appropriate. For example, Salmon has some limitations as compared to Kallisto as others have noted. The chosen statistical test should be explained, as well as how RNA-seq data are processed and interpreted.

      What about the read mapping rate and details for the F1 LP and PL individuals? How did the offspring map to the P genome? These details should be included in Supplementary Figure 1. Could the authors also provide information about the number of genes expressed at each stage in the F1 LP and PL samples in S Figure 2? How many genes went into the PCA? Many of these details are necessary to evaluate the F1 RNA-seq analyses.

      Generally, the authors need to report the statistics used in data processing more thoroughly. The authors need to report the statistics used to (1) process and evaluate the RNA-seq data and (2) determine the significance between the two morphs (Supplementary Figures 1 and 2).

      (1) We clarified in the methods that F1 embryos are collected at the same stage (not absolute time) as the parental types. So the “16-cell” stage is comparable across planktotrophic, lecithotrophic and F1 offspring regardless of absolute time taken to reach that stage (which differs by ~3 hours- Figure 1).

      Figure 2A details every time point collected for all crosses. As mentioned in the methods, we were unable to collect two timepoints for one set of crosses (LP) due to limited tissue. However, we still cover the full development time from “16 cell” through “swimming larvae” stages, which is the full larval development time.

      (2) We appreciate the reviewer's concerns regarding the mapping to the reference genome. The S. benedicti genome is a largely complete and contiguous chromosome-length genome which we have now highlighted in the manuscript. However, the reference is only for the planktotrophic morph. So it is certainly possible that there could be mapping bias for lecithotrophic reads or F1 reads, as we point out in the discussion. While some bias is certainly possible, it is unlikely to be driving major differences in the results. We performed several tests to demonstrate this:

      (1) We conducted two-sided T-tests of the mapping rates between all sample groups in our dataset (PP, LL, PL, LP)  to determine if there were significant differences in mapping rates among the populations. No significant differences were found. The specific results of these statistical tests are included in the updated manuscript in supplementary figure 1 and are as follows:

      Author response table 1.

      (2) In response to the comment about sequence level divergence affecting mapping rate, we estimated pi (nucleotide diversity within a population) and dxy (genomic divergence between two populations) based on the sampled transcriptomic data of our Planktotrophic and Lecithotrophic populations. We used PIXY (Korunes, K.L. and Samuk, K., 2021) with its standard settings to estimate these values, with variant call files in bcf format produced with bcftools - one for all planktotrophic samples and one for all lecithotrophic samples in our dataset. We found that across regions of the transcriptome, the difference in pi between Planktotrophs and Lecithotrophs was between 0.11% and 4.2%. Genomic divergence across the transcriptome is also relatively minor: estimates of dxy ranged from 0.0049 to 0.0076. Given that these estimates show relatively modest differences in nucleotide diversity and overall sequence divergence, we maintain that it is unlikely that they significantly impact the results described in this study. From what we have seen in the literature, these values are not outside of other population studies that are mapping to a species reference derived from one population.

      We added the mapping rates of all samples in the Supplement (SFig. 1) as requested. We added the number of genes expressed at each stage in the Supplement (SFig. 2) as requested. We have also provided further details and figures (Fig 8B) on read mapping rates and statistics used in data processing, including those for F1 RNA-seq data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The endocannabinoid system (ECS) components are dysregulated within the lesion microenvironment and systemic circulation of endometriosis patients. Using endometriosis mouse models and genetic loss of function approaches, Lingegowda et al. report that canonical ECS receptors, CNR1 and CNR2, are required for disease initiation, progression, and T-cell dysfunction.

      Strengths:

      The approach uses genetic approaches to establish in vivo causal relationships between dysregulated ECS and endometriosis pathogenesis. The experimental design incorporates both bulk and single-cell RNAseq approaches, as well as imaging mass spectrometry to characterize the mouse lesions. The identification of immune-related and T-cell-specific changes in the lesion microenvironment of CNR1 and CNR2 knockout (KO) mice represents a significant advance

      Weaknesses:

      Although the mouse phenotypic analyses involve a detailed molecular characterization of the lesion microenvironment using genomic approaches, detailed measurements of lesion size/burden and histopathology would provide a better understanding of how CNR1 or CNR2 loss contributes to endometriosis initiation and progression. The cell or tissue-specific effects of the CNR1 and CNR2 are not incorporated into the experimental design of the studies. Although this aspect of the approach is recognized as a major limitation, global CNR1 and CNR2 KO may affect normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, or lead to preexisting alterations in host or donor tissues, which could affect lesion establishment and development in the surgically induced, syngeneic mouse model of endometriosis.

      We appreciate the reviewer's thoughtful and constructive feedback. We agree that the additional measurements of lesion size/burden and histopathology would provide valuable insights into the specific contributions of CNR1 and CNR2 to endometriosis progression. However, the focus of this study was on assessing the alterations in complex immune microenvironment due to the absence of CNR1 and CNR2, given their close relation in regulating immune cell populations. We will plan to incorporate these measurements in future studies to further strengthen the understanding of the disease pathogenesis. Regarding the potential effects of global knockout, the reviewer raises a valid concern. To address this, we will explore cell and/or tissue-specific knockout models in future experiments to better isolate the direct effects of CNR1 and CNR2 on the disease process, while minimizing potential confounding factors from systemic alterations.

      Reviewer #2 (Public Review):

      Summary:

      The endocannabinoid system (ECS) regulates many critical functions, including reproductive function. Recent evidence indicates that dysregulated ECS contributes to endometriosis pathophysiology and the microenvironment. Therefore, the authors further examined the dysregulated ECS and its mechanisms in endometriosis lesion establishment and progression using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. The authors presented differential gene expressions and altered pathways, especially those related to the adaptive immune response in CNR1 and CNR2 ko lesions. Interestingly, the T-cell population was dramatically reduced in the peritoneal cavity lacking CNR2, and the loss of proliferative activity of CD4+ T helper cells. Imaging mass cytometry analysis provided spatial profiling of cell populations and potential relationships among immune cells and other cell types. This study provided fundamental knowledge of the endocannabinoid system in endometriosis pathophysiology.

      Strengths:

      Dysregulated ECS and its mechanisms in endometriosis pathogenesis were assessed using two different endometrial sources of mouse models of endometriosis with CNR1 and CNR2 knockout mice. Not only endometriotic lesions, but also peritoneal exudate (and splenic) cells were analyzed to understand the specific local disease environment under the dysregulated ECS.

      Providing the results of transcriptional profiles and pathways, immune cell profiles, and spatial profiles of cell populations support altered immune cell population and their disrupted functions in endometriosis pathogenesis via dysregulation of ECS.

      In line 386: Role of CNR2 in T cells. The finding that nearly absent CD3+ T cells in the peritoneal cavity of CNR2 ko mice is intriguing.

      The interpretation of the results is well-described in the Discussion.

      Weaknesses:

      The study was terminated and characterized 7 days after EM induction surgery without the details for selecting the time point to perform the experiments.

      The authors also mentioned that altered eutopic endometrium contributes to the establishment and progression of endometriosis. This reviewer agrees with lines 324-325. If so, DEGs are likely identified between eutopic endometrium (with/without endometriosis lesion induction) and ectopic lesions. It would be nice to see the data (even though using publicly available data sets).

      Figure 7 CDEF. The results of the statistical analyses and analyzed sample numbers should be added. Lines 444-450 cannot be reviewed without them.

      This reviewer agrees with lines 498-500. In contrast, retrograded menstrual debris is not decidualized. The section could be modified to avoid misunderstanding.

      We would like to thank the reviewer for insightful comments, suggestions and acknowledging the importance of the work presented in this manuscript.

      Regarding 7-day time point, we have provided rationale in lines 479-481, but agree that it isn’t sufficient and hence we have provided additional details on the selection of the 7-day time point for the experiments in methods section (Mouse model of EM). We have also noted the suggestion on providing comparison of differentially expressed genes in the eutopic endometrium vs ectopic lesions. Since there are publications comparing the eutopic vs ectopic gene expression patterns (PMIDs: 33868805 and 18818281), including a study exploring the ECS genes in the endometrium throughout different menstrual cycles (PMID: 35672435), we believe additional analysis using the same dataset may not yield new information. However, we see the value in reviewer’s comment, and we will look at the gene expression patterns in the uterine vs endometriosis like lesions in our future studies with tissue or cell specific CNR1 and CNR2 knockout models to understand functional relevance of ECS in endometriosis initiation.

      Since the IMC study was exploratory for proof of concept, we did not have enough biological replicates for meaningful statistical validation (n = 2-3). We have clarified this information in the methods, results, and figure legends for appropriately representing the limitations of the current setup.

      Finally, we appreciate the feedback on the section discussing retrograded menstrual debris. Even though the menstrual debris may not be decidualized, some endometriotic lesions have the ability to decidualize based on their response to estrogen and progesterone in a cycling manner (PMID: 26450609), similar to the endometrium in the uterine cavity. We have clarified this in the revised MS.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      The mechanism of how alterations in ECS contribute to the observed cellular and molecular changes is unclear. Connecting CNR1 or CNR2 function to a specific cell type or cellular process would provide a more detailed understanding of how dysregulated ECS contributes to endometriosis pathogenesis.

      We agree that integrating the functions of CNR1 or CNR2 to specific cell types or cellular processes would strengthen the mechanistic insights presented in our study. This would help elucidate specific pathways by which dysregulated ECS leads to the alterations in immune cell populations, gene expression profiles, and other key aspects of endometriosis development and progression. This is a rapidly evolving field and at this stage, we do not have published information to reflect on this aspect in the revised manuscript.

      (1) As mentioned in the text, the ECS components being studied are widely expressed and may affect multiple aspects of endometriosis pathogenesis and symptomatology. However, the cell or tissue-specific effects of the CNR1 and CNR2 are not incorporated into the experimental design of the studies. Although these limitations are mentioned in the discussion, it is important to know if global CNR1 and CNR2 KO affect normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, or if preexisting alterations in host or donor tissues affect lesion development in the surgically induced, syngeneic mouse model of endometriosis. This would also be the case in studies on immune system dysfunction or lesion microenvironment, as it is possible preexisting immune system dysfunction following CNR1 or CNR2 loss could alter the disease trajectory and lead to a misinterpretation of the findings. Some of these potential confounders could be addressed using crossover approaches in Figure 1A experimental design, but the donor tissues are reported to be matched to the recipients based on genotype.

      The reviewer raised an excellent point that the widespread expression of the ECS components studied in our manuscript may affect multiple aspects of endometriosis pathogenesis and symptomatology. Indeed, the cell or tissue-specific effects of CNR1 and CNR2 knockout are not fully incorporated into our experimental design, which could lead to potential confounding factors that may affect the interpretation of some of our findings. However, as outlined in our previous comments, we will incorporate the tissue/cell specific knockout, as well the crossover approaches to elucidate if the loss of CNR1 and CNR2 function is lesion driven in future studies. We agree that it is important to understand the impact of global CNR1 and CNR2 knockout on normal female reproductive tract function, ovarian steroid hormone levels, decidualization response, and other potential preexisting alterations in the host or donor tissues that could influence lesion development in the syngeneic mouse model of endometriosis. As outlined in the MS (lines 59-62), there are studies highlighting pregnancy specific impact including implantation and impaired primary decidual zone formation. We did not find any baseline alterations in the systemic immune profiles between the CNR1 and CNR2 knockout mice and the WT mice without EM induction. However, the uterine environment has not been assessed to understand the baseline immune profile between the knockout mice and WT mice. We agree with the reviewer that, the possibility of preexisting immune system dysfunction following CNR1 or CNR2 loss could alter the disease trajectory related to immune system dysfunction or lesion microenvironment. We have highlighted this in the limitations section.

      (2) The phenotypic characterization of the endometriosis mouse model with or without CNR1 or CNR2 KO is very limited. To better understand how the observed cellular and molecular alterations correlate with endometriosis pathogenesis and severity CNR1 and CNR2 K/O mice, a detailed characterization of lesion size differences and histopathology should be made. Importantly, the histopathological characterization of the lesions would complement the imaging mass spectrometry findings.

      We agree that more detailed characterization of the endometriosis lesions in our CNR1 and CNR2 knockout mouse models are required. As evident for our several previous publications, we have focused on detailed histopathological characterization of endometriotic lesions in our syngeneic mouse model of endometriosis including a multiple time course study (Symons et al, 2020, FASEB). In the present investigation, we focused on cataloging spatial and transcriptomic changes as we do not currently have any information on the global influence of CNR1 and CNR2 knockout on endometriosis lesion microenvironment, since we prioritized this aspect, we were not able to provide detailed histological assessment of lesions. However, the IMC analysis provides a detailed, spatially resolved profile of the cellular composition and interactions within the endometriotic lesions, which we believe offers valuable insights into the mechanisms by which the dysregulated ECS may contribute to endometriosis pathogenesis. This quantitative, high-dimensional approach complements the transcriptional profiling and other analyses we have performed.

      (3) Given the effect sizes and variance observed with the ECS ligand measurements, an N = 4-5 biological samples for mouse phenotypic studies seems too low.

      The reviewer raises a valid point about low sample size. As elaborated earlier, this was a proof of principle study to capture biologically significant alterations within lesion and surrounding peritoneal microenvironment in the absence of CNR1, CNR2 receptors. This information is crucial for establishing the potential mechanisms by which the dysregulated ECS may contribute to the pathogenesis of endometriosis. Now that we have established the framework and baseline understanding of immune-inflammatory alterations, we will refine our future experimental approaches and include more samples if becomes necessary.

      Reviewer #2 (Recommendations For The Authors):

      It is hard to read the labeling of figures. Please increase the font size of each figure.

      We have increased the font size of the labels where necessary to improve the readability.

      Supplementary Data 1, Table 1 seems like Supplementary Table 1. Please use the same labeling of the Supplementary tables and figures to avoid confusion.

      We have updated the labeling accordingly and ensured that all supplementary tables and figures are consistently labeled.

      This reviewer suggests depositing RNA-seq and IMC data to NCBI etc. and listing the accession number in the MS.

      Thank you for your recommendation to deposit the RNA-seq and imaging mass cytometry (IMC) data from our study in public repositories such as NCBI. We appreciate your suggestion, as data sharing is an important aspect of scientific transparency and reproducibility. Bulk mRNA sequencing data has been attached as a supplementary file and IMC data has been deposited on Mendeley Data (DOI: 10.17632/2ptns5yhzh.1).

      Please clarify L363.

      We have clarified this in the revised MS. The revised text now reads: “However, we did not find the same differences (T cell-related genes) in the UnD lesions of CNR2 k/o mice. Moreover, UnD lesions of CNR2 k/o mice showed significantly low number of DEGs (11 compared to 65 in the DD lesions from CNR2 k/o mice) suggesting a decidualization dependent response (Supplementary Data 3).”

      Figure 7B: It is hard to see/understand the results in L438-440. It might be helpful if % is added to the figure.

      We have added more tick marks to the y-axis of Figure 7B to make it easier for the reader to interpret the percentages of the different cell types.

      Figure 7 legend: 2nd D should be G.

      We have revised the legend accordingly.

      Supplementary Figure 6: It seems immune cells are clustered in CN1, which is different from Figure 7. To easily understand Suppl Fig 6AB, please add some details in the legend.

      We have revised the legend as suggested.

      The revised legend now reads: “A, B Representative image of 8 distinct cell types from CN analysis of DD and UnD lesions from WT, CNR1 k/o, and CNR2 k/o mice, respectively. C Heatmap representation of CN analysis shows distinct clustering patterns observed in the UnD lesions among the different genotypes. The clustering reveals distinct spatial patterns of immune cell populations within the UnD lesions, which appear to differ from the observations in Figure 7G. This suggests potential spatial heterogeneity in the immune landscape of EM like lesions under conditions of decidualization.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study reports on a previously unrecognized function of ATG6 in plant immunity. The work is valuable because it proposes a direct interaction between ATG6 and a well-studied salicylic acid receptor protein, NPR1, which may interest researchers investigating plant immunity regulation. While the data presented are compelling, more information regarding the specificity of ATG6's role would improve the overall impact of the study, especially with an eye towards consistency with prior work.

      We also genuinely thank the editor and reviewers for the constructive and helpful suggestions and comments. These comments have greatly improved the quality and thoroughness of our manuscript. We have carefully studied these comments and have made the appropriate changes as far as possible. Additionally, some minor errors were also corrected during the revision process. New text is shown in blue in the revised manuscript. Our responses to the reviewer's comments are provided below each respective comment.

      Public Reviews:

      Reviewer #1 (Public Review):<br /> Summary:<br /> The authors showed that autophagy-related genes are involved in plant immunity by regulating the protein level of the salicylic acid receptor, NPR1.<br /> Strengths:<br /> The experiments are carefully designed and the data is convincing. The authors did a good job of understanding the relationship between ATG6 and NRP1.

      Thank you very much for recognizing our research.

      Weaknesses:<br /> - The authors can do a few additional experiments to test the role of ATG6 in plant immunity.<br /> I recommend the authors to test the interaction between ATGs and other NPR1 homologs (such as NPR2).

      Thanks to your valuable feedback, it was discovered that the Arabidopsis NPRs family comprises six members: NPR1, NPR2, NPR3, NPR4, NPR5/PETIOLE 1 (BOP1), and NPR6/BOP2. NPR3/4 function in tandem as negative regulators to modulate SA signaling and plant immune responses (Ding et al., 2018). Similar to NPR1, NPR2 acts as a positive regulator of SA signaling (Castello et al., 2018). NPR5/BOP1 and NPR6/BOP2 primarily participate in the regulation of plant growth and development (McKim et al., 2008). This study specifically investigates the correlation between ATG6 and NPRs in plant resistance to pathogenic bacteria. Consequently, we experimentally confirmed the interaction between ATG6 and NPR1, NPR3, and NPR4 (Fig. 1 and Fig. S1 in the revised manuscript). It would be intriguing to further explore the interactions between ATG6 and other NPRs in the context of regulating plant growth and development in future research endeavors.

      -The concentration of SA used in the experiment (0.5-1 mM) seems pretty high. Does a lower concentration of SA induce ATG6 accumulation in the nucleus?

      Thank you for pointing this out. The NPR1 protein is known to be unstable and prone to degradation through the 26S proteasome pathway (Spoel et al., 2009; Saleh et al., 2015). Consequently, to investigate the function of NPR1, many scientists and research groups typically employ higher concentrations of SA (e.g., 0.5 mM, 1 mM, or even 5 mM) to elucidate its role (Spoel et al., 2009; Fu et al., 2012; Lee et al., 2015; Saleh et al., 2015; Skelly et al., 2019; Zavaliev et al., 2020; Chen et al., 2021a). In our study, we observed an interaction between ATG6 and NPR1. To enhance the detection of the NPR1 protein, we standardized the SA concentration (Arabidopsis was treated with 0.5 mM SA; Tobacco was treated with 1 mM SA) used in our experiments. Subsequently, we analyzed the nuclear accumulation ATG6 or NPR1 using a relatively high SA concentration (Arabidopsis was treated with 0.5 mM SA; Tobacco was treated with 1 mM SA), consistent with concentrations used in previous studies (Spoel et al., 2009; Lee et al., 2015; Saleh et al., 2015; Skelly et al., 2019; Zavaliev et al., 2020; Chen et al., 2021a).

      -Does the silencing of ATG6 affect the cell death (or HR) triggered by AvrRPS4?

      Thank you for pointing this out. In this study, we examined changes in Pst DC3000/avrRps4-induced cell death in Col, amiRNAATG6 # 1, amiRNAATG6 # 2, npr1, NPR1-GFP, ATG6-mCherry and ATG6-mCherry × NPR1-GFP plants. The results of Taipan blue staining showed that Pst DC3000/avrRps4-induced cell death in npr1, amiRNAATG6 # 1 and amiRNAATG6 # 2 was significantly higher compared to Col (Fig. S15 in the revised manuscript). Conversely, Pst DC3000/avrRps4-induced cell death in ATG6-mCherry, NPR1-GFP and ATG6-mCherry × NPR1-GFP was significantly lower compared to Col. Notably, Pst DC3000/avrRps4-induced cell death in ATG6-mCherry × NPR1-GFP was significantly lower compared ATG6-mCherry and NPR1-GFP (Fig. S15 in the revised manuscript). These results suggest that ATG6 and NPR1 cooperatively inhibit Pst DC3000/avrRps4-induced cell dead. The relevant description can be found in lines 394-404 of the revised manuscript.

      -SA and NPR1 are also required for immunity and are activated by other NLRs (such as RPS2 and RPM1). Is ATG6 also involved in immunity activated by these NLRs?

      Thank you for your valuable comments. The most notable event in the NLR-mediated ETI immune response is the induction of hypersensitive response-programmed cell death (HR-PCD) (Jones and Dangl, 2006; Yuan et al., 2021). SA plays a dual role in the ETI response. On one hand, the accumulation of SA during the R gene-mediated ETI defense response is directly linked to the onset of HR-PCD (Nawrath and Metraux, 1999). SA and NPR1 can enhance the ETI response by regulating the expression of downstream target genes (Falk et al., 1999; Feys et al., 2001; Ding et al., 2018; Liu et al., 2020). On the other hand, the activation of SA signaling can have a negative regulatory effect on HR-PCD during the ETI response. High levels of SA have been shown to significantly inhibit HR-PCD triggered by the avrRpt2 effector (Rate and Greenberg, 2001; Devadas and Raina, 2002; Jurkowski et al., 2004). Rate et al. discovered that the inhibition of HR-PCD by SA relies on NPR1 (Rate and Greenberg, 2001).

      Arabidopsis AtATG6 or its homologs in other species (such as NbBECLIN1, TaATG6s, etc.) have been identified as positive regulators in plant immunity, playing a crucial role in inhibiting cell death and preventing invasion by pathogenic microorganisms (Liu et al., 2005; Patel and Dinesh-Kumar, 2008; Yue et al., 2015). Patel et al. demonstrated that, akin to autophagy-deficient mutants previously documented, AtATG6 antisense (AtATG6-AS) plants treated with Pst DC3000/avrRpm1 exhibited diffuse cell death, indicating the necessity of ATG6 in restricting cell death (Patel and Dinesh-Kumar, 2008). In tobacco, deficiencies in BECLIN 1 result in the onset of diffuse HR-PCD, underscoring the essential role of BECLIN 1 in limiting HR-PCD (Liu et al., 2005). Despite the genetic evidence supporting the critical function of ATG6 in plant immunity, the precise molecular mechanisms through which ATG6 impedes the invasion of pathogenic microorganisms remain elusive.

      In our study, we uncovered that ATG6 interacts with NPR1 to hinder pathogen invasion and inhibit the initiation of cell death. In animals, members of the NLR family have been observed to interact with the autophagy-related protein LC3 to inhibit the survival of pathogen (Zhang et al., 2019). Similar mechanisms may exist in plants. However, it remains to be explored whether NLR directly induces the activation of ATG6 through interaction or the relationship between NPR1-ATG6 interactions and NLR-mediated plant immunity, necessitating further investigation.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      However, the overall conclusions of the study are not well supported experimentally. The significance of the findings is low because of their mostly correlational nature, and lack of consistency with earlier reports on the same protein.

      Thank you for your valuable and constructive suggestions. In this article, we unveil a novel relationship in which ATG6 positively regulates NPR1 in plant immunity (Fig. 8 in the revised manuscript). ATG6 interacts with NPR1 to synergistically enhance plant resistance by regulating NPR1 protein levels, stability, nuclear accumulation, and formation of SINCs-like condensates. This may be of interest to researchers studying the regulation of plant immunity. While there may be minor flaws in our current study, the significance of these findings cannot be overstated, as they have the potential to redirect scientific attention towards uncovering novel functions for autophagy genes.

      Based on the integrity and quality of the data as well as the depth of analysis, it is not yet clear if ATG6 is a specific regulator of NPR1 or if it is affecting NPR1's stability indirectly, through inducing an elevation of SA levels in plants. As such, the current study demonstrates a correlation between overexpression of ATG6, SA accumulation, and NPR1 stability, however, whether and how these components work together is not yet demonstrated.

      Thanks to your valuable feedback. Although as the reviewer said there may be some flaws in our data from the current results, scientific research is an ongoing process and I am confident that future studies will be even better. From the results given to us at the moment at least this study reports a previously undiscovered function of ATG6 in plant immunity. We propose a direct interaction between ATG6 and a well-studied salicylic acid receptor protein, NPR1. We unveil a novel relationship in which ATG6 positively regulates NPR1 in plant immunity (Fig. 8 in the revised manuscript). ATG6 interacts with NPR1 to synergistically enhance plant resistance by regulating NPR1 protein levels, stability, nuclear accumulation, and formation of SINCs-like condensates. This may be of interest to researchers studying the regulation of plant immunity.

      Based on the provided biochemical data, it is not yet clear if the ATG6 functions specifically through NPR1 or through its paralogs NPR3 and NPR4, which are negative regulators of immunity. It is quite possible that interaction with NPR1 (or any NPR) is not the major regulatory step in the activity of ATG6 in plant immunity. The effect of ATG6 on NPR1 could well be indirect, through a change in the SA level and redox environment of the cell during the immune response. Both SA level and redox state of the cell were reported to induce accumulation of NPR1 in the nucleus and increase in stability.

      Thanks to your valuable feedback. In this study, we validated the interaction between ATG6 and NPR1 through various approaches and identified the key regions mediating their interaction. Our findings indicate that ATG6 interacts with NPR1 to synergistically enhance plant resistance by regulating NPR1 protein levels, stability, nuclear accumulation, and the formation of SINC-like condensates. These results clearly demonstrate the involvement of ATG6 in the regulation of NPR1.Furthermore, we also found that ATG6 interacts with NPR3/4 (Fig. S1 in the revised manuscript). This is particularly relevant given that NPR3 and NPR4 have been shown to act as adaptors for the ubiquitin E3 ligase Cullin 3 (CUL3) to regulate the degradation of NPR1. Therefore, whether ATG6 regulates NPR1 through its interactions with NPR3/4 is an intriguing question worth exploring in future studies. We appreciate the reviewer's concerns and are committed to addressing them in our future research to further elucidate the complex regulatory mechanisms involving ATG6, NPR1, and other key players in plant immunity.

      Another major issue is the poor quality of the subcellular analyses. In contradiction to previous studies, ATG6 in this study is not localized to autophagosome puncta, which suggests that the soluble localization pattern presented here does not reflect the true localization of ATG6. Even if the authors propose a novel, non-canonical nuclear localization for ATG6, they still should have detected the canonical autophagy-like localization of this protein.

      Thanks to your valuable feedback. We conducted predictions at NLS Mapper (https://nls-mapper.iab.keio.ac.jp/cgi-bin/NLS_Mapper_form.cgi) and identified two bipartite NLSs in ATG6, with the sequences "MRKEEIPDKSRTIPIDPNLPKWVCQNCHHS" and "DPNLPKWVCQNCHHS LTIVGVDSYAGKFFNDP". To further elucidate the nuclear localization of ATG6, we introduced Agrobacterium tumefaciens carrying ATG6-GFP into nls-mCherry tobacco leaves through transient transformation. Subsequently, we observed the localization of ATG6-GFP, along with the canonical autophagy-like patterns. Our findings revealed fluorescence signals of ATG6-GFP in both the cytoplasm and nuclei (Figure 2b). The nuclear-localized ATG6-GFP overlapping with the nuclear-localized marker, nls-mCherry (indicated by white arrows). Additionally, we observed punctate patterns indicative of canonical autophagy-like localization of ATG6-GFP fluorescence signals (indicated by red circles). Based on these results, we are more confident about the authenticity of ATG6's nuclear localization. The revised manuscript includes clearer images to support our observations.

      Recommendations for the Authors:

      Reviewer #2 (Recommendations For The Authors):

      The duration and concentration of SA treatments are quite variable between experiments which makes comparisons difficult.

      Thank you for pointing this out. The NPR1 protein is known to be unstable and prone to degradation through the 26S proteasome pathway (Spoel et al., 2009; Saleh et al., 2015). Consequently, to investigate the function of NPR1, many scientists and research groups typically employ higher concentrations of SA (e.g., 0.5 mM, 1 mM, or even 5 mM) to elucidate its role (Spoel et al., 2009; Fu et al., 2012; Lee et al., 2015; Saleh et al., 2015; Skelly et al., 2019; Zavaliev et al., 2020; Chen et al., 2021a). In our study, we observed an interaction between ATG6 and NPR1. To enhance the detection of the NPR1 protein, we standardized the SA concentration used in our experiments. In this study, for the treatment of Arabidopsis, we followed the protocols outlined in Saleh et al. and Spoel et al., utilizing 0.5 mM SA (Spoel et al., 2009; Saleh et al., 2015). For tobacco treatment, we adopted the methodology described in the study by Zavaliev et al., administering 1 mM SA (Zavaliev et al., 2020).

      The methods section does not explain some of the essential experimental conditions and reagents used in the study.

      Thank you for pointing this out. Due to word limitations we have placed the detailed experimental methods and reagents in Supplemental Data 1. In Supplemental Data 1, we provide a comprehensive overview of the experimental flow and conditions employed in our study.

      Lines 62-63: the C-terminal domain of all NPRs has a name (already defined as SA-binding domain (SBD)). Also, it would be worth referring to the structure of NPR1 (Kumar et al 2022, Nat) as the source of information about its domains.

      Thank you for pointing this out, we have changed this description in the revised manuscript (lines 62-63).

      Lines 66-69: NPR1 doesn't form monomers. A recent study showed that the basic functional unit of NPR1 is a dimer (Kumar et al 2022, Nat).

      Thank you for pointing this out. In the revised manuscript (line 67) " monomers " has been changed to “dimer”.

      Lines 89-95 and elsewhere: the term "invasion" has a very specific meaning and it doesn't necessarily refer to disease. A pathogen can invade the plant but cause no disease (e.g. ETI). Most plant genetic immune mechanisms act after pathogen invasion, not before it. Those cited works reported the disease resistance, not the invasion resistance.

      Thank you for pointing this out. We've changed the incorrect description in the revised manuscript (line 91).

      Lines 113-119: the truncation at the aa328 includes half of the ANK domain (repeats 1 and 2), not just BTB. The C-terminal truncation variant contains the other half (repeats 3 and 4) of the ANK domain, not the entire ANK domain. It also contains the SBD, not just the NLS. So, this kind of analysis cannot determine the role of ANK domain in the interaction, nor it can conclusively determine if the interaction is through SBD. The interaction should be tested with the SBD domain only in order to make this conclusion.

      Thank you for pointing this out, we have removed the inappropriate description and made the appropriate changes in the revised manuscript (lines 114 and 115).

      In Figure S1, the equally strong interaction of atg6 is found for NPR3/NPR4. Does that mean that atg6 functions also through these other NPRs? What's the significance of these data compared to NPR1-ATG6 interaction? This is especially important, because both NPR3 and NPR4 are predominantly nuclear proteins, and they are unlikely to significantly overlap with autophagy components in the cytoplasm.

      NPR1 and its paralogues NPR3/NPR4, which frequently interact with other proteins to regulate plant immune responses (Backer et al., 2019; Chen et al., 2019). To identify ATGs that interact with NPRs, we performed yeast two-hybrid (Y2H) screens using NPRs as bait. Interestingly, ATG6 interacted with NPR1, NPR3 and NPR4, respectively, and different concentrations of SA treatment did not significantly affect their interaction (Fig. S1a). NPR1 is an important positive regulator of the plant immune response (Chen et al., 2021b). In Arabidopsis and N. benthamian, ATG6 or its homologues was reported to act as a positive regulator to enhance plant disease resistance to P. syringae pv. tomato (Pst) DC3000 and Pst DC3000/avrRpm1 bacteria (Patel and Dinesh-Kumar, 2008), N. benthamiana mosaic virus (TMV) (Liu et al., 2005). Therefore, in this study we focused on investigating the biological significance of the interaction between ATG6 and NPR1. Whether the interaction between ATG6 and NPR3/4 also has an effect on plant immunity is a question that remains to be explored in future studies.

      In Figure 1c and elsewhere: why not use the anti-mCherry antibody to detect atg6-mcherry? Are we seeing the correct protein band of atg6-mcherry? Also, it is not clear what antibodies they used throughout the study: the sources and specificities of antibodies are not provided.

      Thank you for pointing this out. We initially synthesized the ATG6 antibody (anti-ATG6, 1:200, peptide, C-KEKKKIEEEERK, Abmart) in order to detect the endogenous ATG6 protein, and we also tested the specificity and potency of the ATG6 antibody (results are shown in Fig. S17). Additionally, in order to determine the location of the ATG6-mCherry bands, we also detected ATG6-mCherry in ATG6-mCherry Arabidopsis using the ATG6 antibody, and we also used Col as a control (results are shown in Fig. S4). These results show that our synthesized ATG6 antibody can effectively and clearly immunize to both ATG6 and ATG6-mCherry. Therefore, in this study, we used the ATG6 antibody to analyze both ATG6-mCherry and endogenous ATG6. Detailed antibody information is presented in Supplementary Data 1, table S4

      In Figures 1d, 2a, and 2b, the subcellular localization pattern of atg6 contradicts what was published before (Fujiki et al 2007, Plant Phys; Liu et al 2018, FPlS; Xu et al 2017, Autophagy; Li et al 2018, Nat. Comm.). As an autophagy protein, atg6 was shown to localize to cytoplasmic puncta (autophagosomes), like atg8. No nuclear localization was found in those studies. The lack of puncta and the strong nuclear accumulation are signs that the localization of atg6 reported here has to be interpreted with caution. With the data provided, I am not convinced yet that we are looking at the correct ATG6 subcellular localization. Even if the authors propose a novel, non-canonical localization for atg6, they still should have detected the canonical autophagy-like localization of this protein.

      Thanks to your valuable feedback. To further elucidate the nuclear localization of ATG6, we introduced Agrobacterium tumefaciens carrying ATG6-GFP into nls-mCherry tobacco leaves through transient transformation. Subsequently, we observed the localization of ATG6-GFP, along with the canonical autophagy-like patterns. Our findings revealed fluorescence signals of ATG6-GFP in both the cytoplasm and nuclei (Figure 2b). The nuclear-localized ATG6-GFP overlapping with the nuclear-localized marker, nls-mCherry (indicated by white arrows). Additionally, we observed punctate patterns indicative of canonical autophagy-like localization of ATG6-GFP fluorescence signals (indicated by red circles). Based on these results, we are more confident about the authenticity of ATG6's nuclear localization. The revised manuscript includes clearer images to support our observations.

      It would make more sense to include the BiFC data (fig. S2) in the main figure, instead of the co-localization (fig. 1d) which cannot serve as evidence for interaction.

      Thank you for the feedback. We accept your suggestion. In Fig.1, we have replaced the co-localization image with a BiFC (Bimolecular Fluorescence Complementation) image to better illustrate the interaction.

      In Figure S2, the bifc signals have to be quantified to qualify as evidence for interaction. also, a subcellular marker has to be used (e.g. nuclear mcherry). From the current poor-quality images, one cannot determine where in the cell the presumed interaction takes place, nucleus or cytoplasm, or both. Also, no puncta are seen in these images.

      Thank you for pointing this out. Despite the lack of clarity in the images we provided, our BiFC results unequivocally demonstrate the interaction between ATG6 and NPR1 in both the cytoplasm and nucleus. Notably, as the reviewer pointed out, punctate signals were not observed in our images. This lack of punctate signals is consistent with previous studies (Figure 2) that have also shown BiFC results between autophagy-associated proteins ATG8s and their interacting partners. For instance, Fig 1G (Marshall et al. 2019, Cell), Fig 2F (Marshall et al. 2019, Cell), Fig 4B (Macharia et al. 2019, BMC Plant Biology), and Fig 3 (Zhou et al. 2018, Autophagy) all did not exhibit punctate signals, aligning closely with our findings.

      In Figure S3a, the nuclear localization is shown for stomata. It is known that stomata are especially strong expressors of the transgenes, and localization there could be an artefact of overaccumulation of the fusion protein. Also, why do they present the localization of atg6-gfp, if the analysis and the cross were made with atg6-mcherry?

      Thank you for pointing this out. In our previous experiments, we observed the localization of ATG6 in the nucleus of Arabidopsis thaliana plants overexpressing ATG6-GFP (Fig. S3a). To clearly visualize the location of the nucleus, we used the cytosolic DAPI dye, which readily stained the nuclei of the stomatal guard cells. This allowed us to easily identify the nuclear regions for our observations. Additionally, in Fig. 2a and Fig.S3b, we detected the fluorescence signal of ATG6-mCherry within the nucleus, further confirming the nuclear localization of ATG6. Moreover, the nuclear and cytoplasmic fractions were separated. Under SA treatment, ATG6-mCherry and ATG6-GFP were detected in the cytoplasmic and nuclear fractions in N. benthamiana (Fig. 2c and d). Similarly, ATG6 was also detected in the nuclear fraction of UBQ10::ATG6-GFP and UBQ10::ATG6-mCherry overexpressing plants (Fig. 2e and f).

      In Figure S3b, the images are low resolution and of poor quality. Why atg6-mcherry is expressed in a single cell if these are transgenic plants? The nuclear co-localization with npr1-gfp has to be shown more clearly with high res. images and also be quantified, because the expression of atg6-mcherry is not as uniform as npr1-gfp.

      Thank you for pointing this out. Contrary to the reviewer's assertion, the ATG6-mCherry fluorescence signal depicted in Figure S3b was not exclusive to a single cell. In fact, this fluorescence was also evident in other cells, albeit with relatively weaker intensity. This disparity in fluorescence intensity may be attributed to the irregularities in leaf structure at the time of image capture using the microscope. To bolster our conclusion, we further examined the fluorescence signals in the cells of the root elongation zone in ATG6-mCherry x NPR1-GFP, as depicted in the figure below. Our observations revealed that the fluorescence signals of ATG6-mCherry exhibited uniform distribution, with detection in both the cytoplasm and nucleus. We have replaced the original unclear image with a high-quality image.

      Lines 138-143: In fig. S3d, it would make more sense to show the WB on the hybrid npr1-gfp/atg6-mcherry plants with both anti-gfp and anti-mcherry antibodies to detect the free mcherry/gfp. Since the analysis of the level of free FP is done, then why didn't they test the free mcherry levels in Figure S4a? This would be more important than testing the free GFP in ATG6-GFP plants, because the imaging of atg6-mcherry was done in the hybrid plants (fig. S3b).

      Thank you for pointing this out. We initially synthesized the ATG6 antibody (anti-ATG6, 1:200, peptide, C-KEKKKIEEEERK, Abmart) in order to detect the endogenous ATG6 protein, and we also tested the specificity and potency of the ATG6 antibody (results are shown in Fig. S17). Additionally, in order to determine the location of the ATG6-mCherry bands, we also detected ATG6-mCherry in ATG6-mCherry Arabidopsis using the ATG6 antibody, and we also used Col as a control (results are shown in Fig. S4). These results show that our synthesized ATG6 antibody can effectively and clearly immunize to both ATG6 and ATG6-mCherry. Therefore, in this study, we used the ATG6 antibody to analyze both ATG6-mCherry and endogenous ATG6. Detailed antibody information is presented in Supplementary Data 1, table S4. In the previous experiments, we procured the mCherry antibody (mCherry-Tag Monoclonal Antibody(6B3), BD-PM2113, China) to immunolabel ATG6-mCherry. However, we encountered challenges with the potency of this mCherry antibody, and considering our budget constraints, as well as the availability of our self-synthesized ATG6 antibody, we chose not to pursue the purchase of another antibody from a different company for the continuation of the Western Blot experiment.

      In Figure 2c, there's no atg6-mcherry detected at time 0, in either cytoplasm or nucleus, yet the microscope images in panel a show strong accumulation in both compartments.

      Thank you for pointing this out. Previous studies ATG6 can also be degraded via the 26s proteasome pathway (Qi et al., 2017). We speculate that this phenomenon might be attributed to the rapid turnover of ATG6 at time 0.

      Lines 156-160: this statement is unsupported by the data. In fig. S5, the bands for native atg6 in the nuclear fraction are extremely weak, and they do not show the reverse pattern of change along the time points compared to the cytoplasmic fraction, which would indicate that the nuclear fraction is complementary to the cytoplasmic pool of the protein. The result more likely suggests that the majority of the ATG6 is in the cytoplasm, and that the weak bands detected in the nucleus are either background signal, or a contamination from the cytoplasmic pool. At this low protein level or poor immuno-detection the background signal is inevitable due to overexposure. Even though the actin marker is not detected in the nuclear fraction, it doesn't necessarily mean that there's no contamination from the cytoplasm in the nuclear fraction. The actin is just too abundant and can be detected at lower exposure.

      Thank you for pointing this out. In Fig. S5, we detected the subcellular localization of endogenous ATG6, although the image quality was somewhat low. Nevertheless, the cytosolic and nuclear localization of ATG6 could be clearly observed. In addition to this, we also verified the cytosolic and nuclear localization of ATG6 in Arabidopsis using confocal fluorescence microscopy and nucleoplasmic separation experiments. Actin and H3 were used as cytoplasmic and nucleus internal reference, respectively. (Fig. 2e and f). Furthermore, we observed the cytosolic and nuclear localization of ATG6 when we expressed ATG6-GFP or ATG6-mCherry in tobacco leaves through cis-transfection experiments (Fig. 2a-d). These results are consistent with the prediction of the subcellular location of ATG6 in the Arabidopsis subcellular database (https://suba.live/) (Fig. S3c). The reviewer's feedback has been valuable in helping us present these findings more clearly. We acknowledge the limitations in the image quality for the endogenous ATG6 localization, but we believe the combination of multiple experimental approaches, including the use of fluorescent protein fusions, provides robust evidence for the cytosolic localization of ATG6 in plant cells. Moving forward, we will continue to investigate the significance of ATG6's subcellular distribution and its potential dual roles in both the nucleus and the cytosol, particularly in the context of its interaction with the key immune regulator NPR1. We appreciate the reviewer's constructive comments, as they will help us strengthen the presentation and interpretation of our findings.

      In Figure 3a the images are of too low resolution to see the co-localization. The focal planes of the top and bottom panels are quite different: the top is focused on stomata, the bottom - on pavement cells. So, the number of the NPR1-GFP nuclei between these two focal planes is dramatically different. Also, it looks like the atg6-mcherry in these plants are predominantly in the cytoplasm, not the nucleus as the authors claim. A higher resolution and higher quality of images are required to determine this.

      Thank you for pointing this out. To ensure the clarity and accuracy of our confocal images, we have supplied a clearer image as supplementary evidence. The Bright images distinctly show that both sets of images are in the same plane of focus. Furthermore, in the figure (third one in the fourth column), the nucleus localization of ATG6-mCherry is clearly visible, and that ATG6-mCherry is co-localized with NPR1-GFP in the nucleus, as indicated by the white arrow.

      In Figure 3b, it is not indicated what exactly was measured and in what condition, mock or SA. If these are numbers of nuclei, then it should be indicated what size of the area was sampled, not just "section", and both mock and SA should be included in the measurements. Also, how many independent images have been sampled? what does the error bar represent? What does "normal" mean? Shouldn't this be a mock treatment?

      Thank you for pointing out this. The term "Normal" in this context refers to mock treatment, and we have revised the description for clarity. In Figure 3b, the graph illustrates the count of nuclear localizations of NPR1-GFP in ATG6-mCherry × NPR1-GFP and NPR1-GFP Arabidopsis plants following SA treatment. Statistical data were obtained from three independent experiments, each comprising five individual images, resulting in a total of 15 images analyzed for this comparison. Detailed descriptions were also added to the revised manuscript (Lines 568-570, 800-804).

      Lines 167-168: the proposed increase of NPR1-GFP in the nucleus could be simply due to a higher accumulation of SA in the hybrid plants, not because of the direct interaction of atg6.

      Thank you for pointing out this. Our results confirmed that ATG6 overexpression significantly increased nuclear accumulation of NPR1 (Fig. 3). Notably, the ratio (nucleus NPR1/total NPR1) in ATG6-mCherry × NPR1-GFP was not significantly different from that in NPR1-GFP, and there is a similar phenomenon in N. benthamiana (Fig. 3c-f). These results suggested that the increased nuclear accumulation of NPR1 by ATG6 might result from higher levels and more stable NPR1, rather than the enhanced nuclear translocation of NPR1 facilitated by ATG6. Furthermore, we found that under SA treatment, the protein levels of NPR1 were significantly higher in the ATG6-mCherry × NPR1-GFP line compared to the NPR1-GFP line (Fig. 5a). Notably, even in the absence of differences in SA levels between the two lines, we observed that ATG6 could delay the degradation of NPR1 under normal conditions (Fig. 6). These findings suggest that ATG6 employs both SA-dependent and SA-independent mechanisms to maintain the stability of the key immune regulator NPR1. In summary, we therefore suggest that the increased nuclear accumulation in NPR1 cells is a dual effect of SA and ATG6.

      Lines 202-204: "Increased nuclear accumulation" implies increased translocation. However, they found that the ratio of NPR1-GFP does not change (Figure 3), so the reason for higher nuclear accumulation is not translocation, but abundance.

      Thank you for pointing out this. Our results confirmed that ATG6 overexpression significantly increased nuclear accumulation of NPR1 (Fig. 3). ATG6 also increases NPR1 protein levels and improves NPR1 stability (Fig. 5 and 6). Therefore, we consider that the increased nuclear accumulation of NPR1 in ATG6-mCherry x NPR1-GFP plants might result from higher levels and more stable NPR1 rather than the enhanced nuclear translocation of NPR1 facilitated by ATG6. To verify this possibility, we determined the ratio of NPR1-GFP in the nuclear localization versus total NPR1-GFP. Notably, the ratio (nucleus NPR1/total NPR1) in ATG6-mCherry × NPR1-GFP was not significantly different from that in NPR1-GFP, and there is a similar phenomenon in N. benthamiana (Fig. 3c-f). These results suggested that the increased nuclear accumulation of NPR1 by ATG6 might result from higher levels and more stable NPR1, rather than the enhanced nuclear translocation of NPR1 facilitated by ATG6. Further we analyzed whether ATG6 affects NPR1 protein levels and protein stability. Our results show that ATG6 increases NPR1 protein levels under SA treatment and ATG6 maintains the protein stability of NPR1 (Fig. 5 and 6). These results suggested that the increased nuclear accumulation of NPR1 by ATG6 result from higher levels and more stable NPR1. The corresponding description is shown in revised manuscript (lines 338~352).

      Lines 204-205: the co-localization in Figure 1d cannot be interpreted as interaction.

      Thank you for the feedback. We have replaced the co-localization image with a BiFC (Bimolecular Fluorescence Complementation) image to better illustrate the interaction in Fig 1d.

      What age of plants were used for the analysis in Figures 4 and S7? The age of the plant might significantly affect the free SA levels under control conditions.

      Thank you for the feedback. In Figures 4 and S7, 3-week-old plants were used to determine salicylic acid (SA) levels and the expression of target genes. Figures 4 and S7 figure notes provide detailed descriptions (lines 818-819).

      In Figure 5a they treat with SA, but the analysis in Figure S10 is done with the pathogen, so how can these data be correlated?

      Thank you for pointing out this. Previous studies have demonstrated that pathogen infestation rapidly increases the salicylic acid (SA) content in plants, and the elevated SA then activates plant immune responses. Therefore, both pathogen treatment and direct SA treatment can activate SA-dependent plant immune responses. The NPR1 protein is known for its instability. In Figure 5a, we utilized a 0.5 mM SA treatment to assess the changes in NPR1 protein levels, as the impact of SA treatment is more immediate and pronounced.

      Lines 241-242: In Figure 5b, it is not clear why there's no detection of NPR1-GFP and atg6-mcherry at time 0?? The levels of proteins in the transient assay are sufficiently high for detection by WB.

      Thank you for pointing this out. The NPR1 protein is known to be unstable and prone to degradation through the 26S proteasome pathway (Spoel et al., 2009; Saleh et al., 2015). In addition, previous studies ATG6 can also be degraded via the 26s proteasome pathway (Qi et al., 2017). We speculate that this phenomenon might be attributed to the rapid turnover of NPR1 and ATG6 at time 0.

      In Figures 5c-d, the quality of these images is very poor, and they do not clearly show the signs. What structure was exactly measured in these images? There are so many fluorescent bodies there, that it is not clear what are we looking at. Also, it is not clear why they did not show the mcherry channel? It would be important to see if the bodies in SA-treated plants show co-localization with atg6-mcherry autophagosomes (if these exist at all).

      Thank you for pointing this out. Interestingly, similar to previous reports (Zavaliev et al., 2020), SA promoted the translocation of NPR1 into the nucleus, but still a significant amount of NPR1 was present in the cytoplasm (Fig. 3c and e). Previous studies have shown that SA increased NPR1 protein levels and facilitated the formation of SINCs in the cytoplasm, which are known to promote cell survival (Zavaliev et al., 2020). We therefore observed the fluorescence signal of SINCs-like condensates in the cytoplasm of tobacco leaves. After 1mM SA treatment, more SINCs-like condensates fluorescence were observed in N. benthamiana co-transformed with ATG6-mCherry + NPR1-GFP compared to mCherry + NPR1-GFP (Fig. 5c-d and Supplemental movie 1-2). We have a clearer demonstration in the supplemental video movie 1-2. Additionally, we observed that SINCs-like condensates signaling partial co-localized with certain ATG6-mCherry autophagosomes fluorescence signals.

      Lines 245-247: so, is it atg6 or SA that increases the NPR1 levels? If this is due to SA, then the whole study doesn't have novelty, because we already know from previous works that SA increases the stability of npr1.

      Thank you for pointing this out. Indeed, previous studies have shown that salicylic acid (SA) increases NPR1 levels and protein stability (Spoel et al., 2009; Saleh et al., 2015). In our experiments, we found that under SA treatment, the protein levels of NPR1 were significantly higher in the ATG6-mCherry × NPR1-GFP line compared to the NPR1-GFP line (Fig. 5a). Additionally, free SA levels were also significantly elevated in the ATG6-mCherry × NPR1-GFP line under pathogen challenge (Pst DC3000/avrRps4), but not under normal conditions (Fig. 4a). Furthermore, even in the absence of differences in SA levels between the two lines, we observed that ATG6 could delay the degradation of NPR1 under normal conditions (Fig. 6). These findings represent one of our new discoveries. These findings suggest that ATG6 employs both SA-dependent and SA-independent mechanisms to maintain the stability of the key immune regulator NPR1.

      Lines 313-316: npr1 and atg6 can function independently from each other, so the term "jointly" is misleading. Based on the overall data provided in this manuscript it cannot be concluded that the two proteins work in one complex to control plant immunity.

      Thank you for pointing this out. In the revised manuscript "jointly" has been changed to “cooperatively”.

      Lines 369-374: this speculation is beyond the main hypothesis claiming that atg6 functions through npr1. If atg6 can activate the transcription alone, then what is the significance of its activation of npr1? How can one distinguish between the two?

      Thank you for pointing this out. Transcription activation by transcription factors typically requires at least two conserved structural domains: a transcription activation domain and a DNA-binding domain. However, ATG6 does not possess these two typical conserved structural domains found in canonical transcription factors. Given this structural context, it is unlikely that ATG6 would be able to directly activate transcription on its own. The lack of the canonical transcription factor domains in ATG6 suggests that it may not be able to function as a direct transcriptional activator. Previous studies have shown that acidic activation domains (AADs) in transcriptional activators (such as Gal4, Gcn4 and VP16) play important roles in activating downstream target genes. Acidic amino acids and hydrophobic residues are the key structural elements of AAD (Pennica et al., 1984; Cress and Triezenberg, 1991; Van Hoy et al., 1993). Chen et al. found that EDS1 contains two ADD domains and confirmed that EDS1 is a transcriptional activator with AAD (Chen et al., 2021a). Here, we also have similar results that ATG6 overexpression significantly enhanced the expression of PR1 and PR5 (Fig. 4b-c and S9), and that the ADD domain containing acidic and hydrophobic amino acids is also found in ATG6 (148-295 AA) (Fig. S14). We speculate that ATG6 might act as a transcriptional coactivator to activate PRs expression synergistically with NPR1.

      Lines 389-400: the cell death due to AvrRPS4 in Col-0 ecotype is extremely weak as there's no complete receptor complex for this effector. So, one has to use a very high dose to induce cell death in Col-0, certainly higher than the one used for bacterial growth. The authors used the same dose in both assays, so it is likely that what we see as "cell death" is not an effector-triggered response, but rather symptom-associated for the virulent pathogen.

      Thank you for pointing this out. Indeed, as the reviewer pointed out, most cell death assays use higher concentrations of Pst DC3000/avrRps4 or Pst DC3000/avrRpt2, but they typically treat Arabidopsis for a relatively short period, usually less than 1 day(Hofius et al., 2009; Zavaliev et al., 2020). In this study, although we used relatively low Pst DC3000/avrRps4 (0.001) injections, we detected cell death under a relatively long period of Pst DC3000/avrRps4 infestation (3 days). Pst DC3000/avrRps4-infested plants multiply significantly in host cells, and therefore we assumed that the propagated pathogens after 3 days of incubation would be sufficient to induce intense cell death. Consequently, we chose this concentration of Pst DC3000/avrRps4 for the experiment.

      Lines 407-416: why do you expect "delay of degradation" with autophagy inhibitor? Shouldn't it be the opposite? In Figure S14, if we compare the bands between 120min and 120min+ConA+WM, the effect of autophagy inhibitors is actually quite strong (0.47 vs 0.22), with about 50% more degradation of NPR1 in their presence. So, the conclusion that the degradation of NPR1 is autophagy-independent is wrong according to this result.

      Thank you for pointing this out. We have revised the inaccurate description, as outlined in the revised manuscript (lines 413-425).

      References

      Backer R, Naidoo S, van den Berg N. 2019. The NONEXPRESSOR OF PATHOGENESIS-RELATED GENES 1 (NPR1) and Related Family: Mechanistic Insights in Plant Disease Resistance. Front Plant Sci 10, 102.

      Castello MJ, Medina-Puche L, Lamilla J, et al. 2018. NPR1 paralogs of Arabidopsis and their role in salicylic acid perception. PLoS One 13, e0209835.

      Chen H, Li M, Qi G, et al. 2021a. Two interacting transcriptional coactivators cooperatively control plant immune responses. Sci Adv 7, eabl7173.

      Chen J, Mohan R, Zhang Y, et al. 2019. NPR1 Promotes Its Own and Target Gene Expression in Plant Defense by Recruiting CDK8. Plant Physiol 181, 289-304.

      Chen J, Zhang J, Kong M, et al. 2021b. More stories to tell: NONEXPRESSOR OF PATHOGENESIS-RELATED GENES1, a salicylic acid receptor. Plant Cell Environ.

      Cress WD, Triezenberg SJ. 1991. Critical structural elements of the VP16 transcriptional activation domain. Science 251, 87-90.

      Devadas SK, Raina R. 2002. Preexisting systemic acquired resistance suppresses hypersensitive response-associated cell death in Arabidopsis hrl1 mutant. Plant Physiol 128, 1234-1244.

      Ding Y, Sun T, Ao K, et al. 2018. Opposite Roles of Salicylic Acid Receptors NPR1 and NPR3/NPR4 in Transcriptional Regulation of Plant Immunity. Cell 173, 1454-1467 e1415.

      Falk A, Feys BJ, Frost LN, et al. 1999. EDS1, an essential component of R gene-mediated disease resistance in Arabidopsis has homology to eukaryotic lipases. Proc Natl Acad Sci U S A 96, 3292-3297.

      Feys BJ, Moisan LJ, Newman MA, et al. 2001. Direct interaction between the Arabidopsis disease resistance signaling proteins, EDS1 and PAD4. EMBO J 20, 5400-5411.

      Fu ZQ, Yan S, Saleh A, et al. 2012. NPR3 and NPR4 are receptors for the immune signal salicylic acid in plants. Nature 486, 228-232.

      Hofius D, Schultz-Larsen T, Joensen J, et al. 2009. Autophagic components contribute to hypersensitive cell death in Arabidopsis. Cell 137, 773-783.

      Jones JD, Dangl JL. 2006. The plant immune system. Nature 444, 323-329.

      Jurkowski GI, Smith RK, Jr., Yu IC, et al. 2004. Arabidopsis DND2, a second cyclic nucleotide-gated ion channel gene for which mutation causes the "defense, no death" phenotype. Mol Plant Microbe Interact 17, 511-520.

      Lee HJ, Park YJ, Seo PJ, et al. 2015. Systemic Immunity Requires SnRK2.8-Mediated Nuclear Import of NPR1 in Arabidopsis. Plant Cell 27, 3425-3438.

      Liu Y, Schiff M, Czymmek K, et al. 2005. Autophagy regulates programmed cell death during the plant innate immune response. Cell 121, 567-577.

      Liu Y, Sun T, Sun Y, et al. 2020. Diverse Roles of the Salicylic Acid Receptors NPR1 and NPR3/NPR4 in Plant Immunity. Plant Cell 32, 4002-4016.

      McKim SM, Stenvik GE, Butenko MA, et al. 2008. The BLADE-ON-PETIOLE genes are essential for abscission zone formation in Arabidopsis. Development 135, 1537-1546.

      Nawrath C, Metraux JP. 1999. Salicylic acid induction-deficient mutants of Arabidopsis express PR-2 and PR-5 and accumulate high levels of camalexin after pathogen inoculation. Plant Cell 11, 1393-1404.

      Patel S, Dinesh-Kumar SP. 2008. Arabidopsis ATG6 is required to limit the pathogen-associated cell death response. Autophagy 4, 20-27.

      Pennica D, Goeddel DV, Hayflick JS, et al. 1984. The amino acid sequence of murine p53 determined from a c-DNA clone. Virology 134, 477-482.

      Qi H, Xia FN, Xie LJ, et al. 2017. TRAF Family Proteins Regulate Autophagy Dynamics by Modulating AUTOPHAGY PROTEIN6 Stability in Arabidopsis. Plant Cell 29, 890-911.

      Rate DN, Greenberg JT. 2001. The Arabidopsis aberrant growth and death2 mutant shows resistance to Pseudomonas syringae and reveals a role for NPR1 in suppressing hypersensitive cell death. Plant J 27, 203-211.

      Saleh A, Withers J, Mohan R, et al. 2015. Posttranslational Modifications of the Master Transcriptional Regulator NPR1 Enable Dynamic but Tight Control of Plant Immune Responses. Cell Host Microbe 18, 169-182.

      Skelly MJ, Furniss JJ, Grey H, et al. 2019. Dynamic ubiquitination determines transcriptional activity of the plant immune coactivator NPR1. Elife 8.

      Spoel SH, Mou Z, Tada Y, et al. 2009. Proteasome-mediated turnover of the transcription coactivator NPR1 plays dual roles in regulating plant immunity. Cell 137, 860-872.

      Van Hoy M, Leuther KK, Kodadek T, et al. 1993. The acidic activation domains of the GCN4 and GAL4 proteins are not alpha helical but form beta sheets. Cell 72, 587-594.

      Yuan M, Ngou BPM, Ding P, et al. 2021. PTI-ETI crosstalk: an integrative view of plant immunity. Curr Opin Plant Biol 62, 102030.

      Yue J, Sun H, Zhang W, et al. 2015. Wheat homologs of yeast ATG6 function in autophagy and are implicated in powdery mildew immunity. BMC Plant Biol 15, 95.

      Zavaliev R, Mohan R, Chen T, et al. 2020. Formation of NPR1 Condensates Promotes Cell Survival during the Plant Immune Response. Cell 182, 1093-1108 e1018.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors): 

      The authors should possibly discuss more the other cases when LTPs of the same type of ORP9 and ORP10 have been found to dimerise. They should definitely cite and discuss the evidence reported in February this year in CMLS (see https://link.springer.com/article/10.1007/s00018-023-04728-5). In this paper, authors reported very similar findings as those the authors have in Figures 3, 4, S6, S7, and S8. Specifically, in this CMLS paper the authors find that ORP9 and ORP10 (not ORP11) interact through a central helical region and that ORP9 localises ORP10 to the ER-Golgi MCSs by providing ORP10 with a binding site for VAPs, where the heterodimer mediates the exchange of PtdIns(4)P for PtdSer. 

      We thank the reviewer for their recommendations. The mentioned paper has simply gone unnoticed by us and is now referred in the revised manuscript. Various other papers reporting on LTP dimerizations are already cited in our manuscript: ORP9-ORP10 dimerization (Kawasaki et al. 2022), ORP9-ORP11 dimerization (Zhou et al. 2010), and ORP9-ORP10/11 dimerization (Tan and Finkel 2022). Revised manuscript now discusses the dimerization of CERT and OSBP while citing Gehin et al. 2023, Ridgway et al. 1992 and de la Mora et al. 2021.

      Reviewer #2 (Recommendations For The Authors): 

      Model and Discussion: 

      Give an idea about the aspect of SMS1 function that is being affected. Even if no further experiments were carried out, the authors could discuss possibilities. One might speculate what the PS is being used for. For example, is it a co-factor for integral membrane proteins, such as flippases? Is it a co-factor for peripheral membrane proteins, such as yet more LTPs? The model could include the work of Peretti et al (2008), which linked Nir2 activity exchanging PI:PA (Yadav et al, 2015) to the eventual function of CERT. Could the PS have a role in removing/reducing DAG produced by CERT? 

      We thank the reviewer for their recommendations. The same recommendations were also scripted in the public review, which we believe we answered sufficiently. 

      Other, Minor: 

      Make clear that there is no sterol readout (Fig 1C) 

      We would like to point out that Figure 1C has a sterol readout as CE refers to cholesterol esters.

      PH domains of ORP9 and ORP11 localized only partially to the Golgi, unlike the PH domains of OSBP and CERT" (line 154). Say here where the non-Golgi ORP9 and ORP11 PH domain pool is - presumably in the cytoplasm.  

      We thank the reviewer for their suggestion and rephrase the sentence accordingly. 

      Fig 7H-J: histograms not lines as these are separate unlinked categories

      We thank the reviewer for their suggestion. However, we think the original figure represent our findings in the best possible way. Our analysis regarding individual lipid species is also included in Supplementary figure 10.

      Reviewer #3 (Recommendations For The Authors): 

      (1) At the end of the intro, in summarizing their findings, the authors state (p3. lines 48-49) "These findings highlight how phospholipid and sphingolipid gradients along the secretory pathway are linked at ER-Golgi membrane contact sites." This should instead read "These findings highlight THAT phospholipid and sphingolipid gradients along the secretory pathway are linked at ER-Golgi membrane contact sites." 

      We thank the reviewer for their suggestion and change the sentence accordingly.

      (2) As noted in the public section, to show that ORP9/11 do indeed exchange lipids, an in vitro experiment demonstrating that ORP11 can transfer PI4P is essential. Ideally, it would be best to examine PS AND PI4P transfer by ORP9 AND 11 separately AND then by the ORP9/11 heterodimer. This could lend insights as to the function of the heterodimer. The He et al et Yu paper should provide guidelines for this. Why have the heterodimers? 

      We believe we addressed this point by showing the lipid transfer ability of the ORP9-ORP11 dimer. These findings are now part of the revised manuscript.

      (3) It would be interesting to discuss the roles of ORP9/ORP11 versus ORP9/ORP10... they seem so analogous, although this is at the discretion of the authors. 

      We thank the reviewer for their suggestion. Since the difference between ORP9-ORP10 and ORP9-ORP11 dimers was also raised by other reviewers, we decided to include this discussion in the manuscript. A section based on our answer to Reviewer #2 in Public Review is now part of the Discussions.

      (4) The authors used a melanoma cell line in their screens (p3, line 59). Could they explain why they used this cell line versus others? 

      We chose MelJuSo cell for various reasons. Mainly, MelJuSo are diploid, which eases generating knockouts in a screening setup compared to other polyploid cancer cell lines (e.g. HeLa). Furthermore, our CRISPR/Cas9 screening protocols are optimized for these cell lines.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work provides significant insight into freshwater cable bacteria (CB) and is an important contribution to the emerging CB literature. In this manuscript, Yang et al. describe currentvoltage measurements on CB collected from two freshwater sources in Southern California. The studies use electrostatic and conductive atomic force microscopies, as well as four-probe measurements. These measurements are consistent with back-of-the-envelope calculations on conductivities needed to sustain CB function. The data shows that freshwater CB have a similar structure and function to the more studied marine cable bacteria.

      Strengths:

      Excellent measurements on a new class of cable bacteria.

      Weaknesses:

      The paper would benefit from additional analysis of the data.

      Reviewer #1 (Recommendations for The Authors):

      This work provides significant insight into freshwater cable bacteria (CB) and is an important contribution to the emerging CB literature. In this manuscript, Yang et al. describe current-voltage measurements on CB collected from two freshwater sources in Southern California. The studies use electrostatic and conductive atomic force microscopies, as well as four-probe measurements. These measurements are consistent with back-of-the-envelope calculations on conductivities needed to sustain CB function. The data shows that freshwater CB have a similar structure and function to the more studied marine cable bacteria. Minor comments follow.

      We are grateful to the reviewer for the encouraging feedback and for appreciating the central message of the preprint. Below we address the reviewer’s constructive comments.

      Additional information could be provided regarding the degraded cells where an 'empty cage' remains, as well as the polyphosphate granules, which were previously observed in marine CB (refs. 11 and 18). 

      We have edited the manuscript to note that the appearance of empty cages and the polyphosphate granules in freshwater cable bacteria is indeed consistent with these features as previously reported in marine CB. The size of polyphosphate granules in freshwater CB are comparable or slightly smaller than in marine CB (Sulu-Gambari et al., 2015). In the case of empty cages, these cells were previously described as ‘ghost filaments’ which had lost all cell membrane and cytoplasmic material (Cornelissen et al., 2018). 

      Manuscript edits: a sentence regarding polyphosphate granules has been added into the manuscript from lines 307 - 308. “The size of polyphosphate granules in freshwater CB (70 nm – 400 nm) is comparable or slightly smaller than in marine CB (35)”.

      A sentence regarding the empty cages has been added into the manuscript (lines 303-305). “These empty cages were previously described as ‘ghost filaments’ which had lost all cell membrane and cytoplasm material (20).”

      The authors also state that the 'phase difference between the elevated ridges and interridge regions is proportional to the tip voltage squared,' and refer to Fig. 4D. This figure has only three data points with large error bars. The authors may wish to explain this finding and justify their analysis in greater detail.

      We thank the reviewer for pointing out that we presented this result but did not adequately describe its origin or significance. In general, the probe phase response of electrostatic force microscopy (EFM) can originate not only from the electrostatic interaction with the sample (i.e. the electrical properties of interest) but also from shorter range van der Waals forces (which are more reflective of probe-sample distance i.e. topography). To ensure that EFM is reporting electrical interactions, we performed these measurements using a two-pass technique, with the second pass retracing the topography measured during the first pass, but at a fixed height above the surface where the interactions are long range (electrostatic) rather than short range (vdW) or resulting from topography cross-talk. The purpose of the voltage change measurement (Fig. 4D) is to simply assess whether this procedure is successful, since electrostatic forces are proportional to the square of the voltage at a fixed height (F = ½ . ∂C⁄∂z .V2). While the error bar of that measurement is high, due to the intrinsic noise in the dynamic (high frequency) EFM phase response measurement, we note that the purpose of this measurement is simply to assess that the interaction is due to the electrical interaction with the sample, before proceeding to actual conductance measurements (Figs. 5-8).

      Manuscript edits: we previously simply cited a reference where the reader can delve deeper into the origin of the square voltage signal. To put this into better context, we now include an additional information (lines 461 - 475), noting the origin and purpose of the result as described above.  

      It is interesting that the freshwater CB appear to be more resilient to air compared to marine CB (or at least some freshwater filaments, as the authors note that the level of resilience is filament-dependent). The authors indicate that salt affects oxygen solubility and there is a larger oxygen content in freshwater. Do the authors have thoughts on whether or not the differences between marine and freshwater CB could fit, or not fit, with the hypothesis that conductivity in air is lowered due to oxidation of the Ni/S species (ref. 25 in manuscript)? Could the freshwater CB have greater protection against oxidation?

      We thank the reviewer for highlighting this point. Indeed, our manuscript mentions the current hypothesis that conductivity of cable bacteria may be diminished upon oxidation of the Ni/S groups (lines 101 - 105 and 498 - 504). It remains unclear how this idea may lead to variability between marine and freshwater cables. Interestingly, however, a recent comparative bioRxiv preprint (Digel et. al. 2023) noted significant differences in the morphology, number, and crosssectional area of nanofibers between a freshwater and marine CB strain. These differences may lead to a different resiliency against oxidative degradation upon exposure air. Specifically, even though the marine CB strain was characterized by a larger cross-section area per nanofiber, it had significantly fewer nanofibers, leading to 40% smaller total area than its freshwater counterpart. We have edited the manuscript to highlight these possible differences (at least in size) between freshwater and marine cables.

      Manuscript edits (lines 506 – 514) “For example, a recent comparative study (21) hints at significant differences in the morphology, number, and size of nanofibers when comparing a marine CB strain to a freshwater CB strain. Specifically, while the marine CB was characterized by a 50% larger cross-sectional area per nanofiber, the total nanofibers’ area was 40% smaller than the freshwater strain due to a smaller number of nanofibers per CB filament. Given the proposed central role of nanofibers in mediating electron transport along CB, it is possible that such differences may also lead to different degrees of tolerance against oxidative degradation upon exposure to air.”

      Figure 6D shows current-voltage measurements from three representative cables; there is a large variation, most notably between Cable 1 and Cables 2 and 3. Is this variation typical for different cables? Can the authors comment on the range of values observed and how many cables fit into different ranges? Any thoughts on the reasons behind the range?

      Figure 6 B and C (red and blue) are representative of most of the cable conductance measured using the point IV CAFM technique, with the Figure 6 A (green) IV curve being an example of the upper limit, which was less frequently observed. In total we measured ten cables using the point IV CAFM technique. These variations may stem from actual differences in the conductivity of separate CB filaments, the environment of the measurement, or limitations in the conductive AFM measurement techniques. These limitations include a large contact resistance due to the interaction of the small probe with the sample, which may lead to large variability depending on the contact point.  For this reason, we rely on 4-probe measurements (Fig. 8) for quantitative conductive analyses, rather than conductive AFM. It is important to note, however, that the conductive AFM measurements (Fig. 6 and Fig. 7) provide other complementary information including the demonstration of both transverse and longitudinal transport (lines 389-393) in Fig. 6 and the visualizing of the current carrying nanofibers in Fig. 7. 

      Manuscript edits: we have edited the manuscript (lines 413 - 418) to make it clear that the quantitative estimate of conductivity was made only using 4 probe measurements due to the limitations of CAFM or two-probe techniques.

      Can the authors comment on how the number of fibers per CB in their samples compares with the number of fibers in marine CB? Marine CB are known to have pinwheel junctions where the fibers come together before branching out again. This pinwheel design could play a role in the function of the CB or in its survival (see Adv. Biosys. 2020, 4, 2000006). Were pinwheel structures observed in freshwater CB? If so, how do they compare?

      From the previous studies, estimates of the number of fibers in marine CB appeared to vary significantly from 15 or 17 (Pfeffer et. al., 2012) to 58 – 61 (Cornelissen et. al., 2018). In our freshwater CB, we estimated the number of fibers at ~35 per CB (line 423), which is comparable to the count of 34 per freshwater CB recently reported by Digel et al., bioRxiv 2023. We cannot specifically comment on the pinwheel structure as we did not perform the transverse thin section TEM imaging necessary to observe the cell-cell junctions in this particular study.

      On lines 95-96, the authors discuss the fact that marine cable bacteria have a wide variance in their measured conductivities. While one may ask if the larger marine conductivities (near 80 S/cm) are representative, a conductivity of 0.1 S/cm is 2 orders of magnitude lower than this value, which the field generally refers to as a high conductivity. The authors should mention whether or not any of their specimens display the high conductivities seen in select marine cable bacteria specimens.

      It is indeed important to note that the ~80 S/cm figure refers to an upper end previously observed (ref. 22) for marine CB conductivity. In our manuscript (lines 525 - 526), we highlight that the previously observed range (including in that same study) is 10−2-101 S/cm and we were careful to qualify the previously reported upper end with ‘reaching as high as’ (line 97). Note that this places our measurement of 0.1 S/cm within the previously reported range. We have not observed freshwater CB conductivity near the upper end of the previously reported range, and generally propose that these types of measurements are better analyzed in the context of the biological function rather than ‘high vs. low’. Towards that end, the manuscript (lines 527-537) makes the argument that the 10-1 S/cm figure may be sufficient to support the electrical currents mediated by CB in sediments. We have edited the manuscript to highlight that we did not observe single CB nanofiber conductivity near the upper limit previously observed in marine CB (lines 522 525). 

      Reviewer #2 (Public Review):

      Summary:

      In this work, Mohamed Y. El-Naggar and co-workers present a detailed electronic characterization of cable bacteria from Southern California freshwater sediments. The cable bacteria could be reliably enriched in laboratory incubations, and subsequent TEM characterization and 16S rRNA gene phylogeny demonstrated their belonging to the genus Candidatus Electronema. Atomic force microscopy and two-point probe resistance measurements were then used to map out the characteristics of the conductive nature, followed by microelectrode four-probe measurements to quantify the conductivity.

      Interestingly, the authors observe that some freshwater cable bacteria filaments displayed a higher degree of robustness upon oxygen exposure than what was previously reported for marine cable bacteria. Finally, a single nanofiber conductivity on the order of 0.1 S/cm is calculated, which matches the expected electron current densities linking electrogenic sulphur oxidation to oxygen reduction in sediment. This is consistent with hopping transport.

      Strengths and weaknesses:

      A comprehensive study is applied to characterize the conductive properties of the sampled freshwater cable bacteria. Electrostatic force microscopy and conductive atomic force microscopy provide direct evidence of the location of conductive structures. Four-probe microelectrode devices are used to quantify the filament resistance, which presents a significant advantage over commonly used two-probe measurements that include contributions from contact resistances. While the methodology is convincing, I find that some of the conclusions seem to be drawn on very limited sample sizes, which display widely different behavior. In particular:

      The authors observe that the conductivity of freshwater filaments may be less sensitive to oxygen exposure than previously observed for marine filaments. This is indeed the case for an interdigitated array microelectrode experiment (presented in Figure 5) and for a conductive atomic force microscopy experiment (described in line 391), but the opposite is observed in another experiment (Figure S1). It is therefore difficult to assess the validity of the conclusion until sufficient experimental replications are presented.

      We indeed acknowledge both in the abstract (line 23-26) and section 2.2 (lines 374-377) the variable nature of the sensitivity and filament-dependent response to air exposure. Our discussion (lines 498-506) considers the possible reasons for this variability:

      ‘While these observations showed a high degree of variability and therefore require a more detailed investigation, it is interesting to consider the possibility that the oxidative decline (or other damaging processes), thought to be a consequence of oxidation of Ni cofactors involved in electron transport (25), may not affect all sections of the cm long CB filaments simultaneously; under these conditions, IDA measurements, which probe multiple micrometer-scale electrode-crossing CB regions (e.g. 372 crossings in Figure 5 inset) may offer an advantage over techniques addressing entire CBs or specific CB regions. It is also interesting to consider an alternative possibility that the conductive properties of freshwater CB maybe intrinsically more oxygen-resistant than marine CB’.

      To summarize , the manuscript points to the likelihood that the IDA technique used here may offer an advantage for detecting currents under damaging conditions since it interrogates multiple sections simultaneously. Furthermore, in a recent preprint from Digel et al., (2023), the conductivity of the only freshwater strain investigated in that study was among the highest compared to other marine CB strains. Therefore, the freshwater CB being more resistant is one possibility to be investigated based on these observations and results. We therefore present the latter as a possibility in the discussion.

      The calculation of a single nanofiber conductivity is based on experiment and calculation with significant uncertainty. E.g. for the number of nanofibers in a single filament that varies depending on the filament size (Frontiers in microbiology, 2018, 9: 3044.), and the measured CB resistance, which does not scale well with inner probe separation (Figure 5). A more rigorous consideration of these uncertainties is required.

      The reviewer raises an important point. For these calculations, we made sure to determine the representative number of fibers per cable and thickness of the nanofibers (~50 nm) from our own samples. We indeed assessed the possible variability across our different cable filaments and found the fiber numbers varied from 30 – 44 (with 35 used as a representative figure in the paper). For the scaling of resistance with inner probe separation, our 4P results estimated that the CB resistances are 47 MΩ  and 240 MΩ for the 20 µm and 200 µm lengths, respectively, rather than an expected tenfold difference if the cable has a uniform conductivity along the entire filaments. This result suggests nonuniform conductivity in different sections of the CB filament. Since accounting for non-uniform conduction (and variability in fiber morphology/density) is clearly difficult, we were careful to limit our conclusion to an order of magnitude estimate (e.g. lines 522-525). Given the previously reported range of cable bacteria conductivity (10−2101 S/cm), this places our estimate within this range. We have further edited the manuscript to note that our reported single nanofiber conductivity cannot be constrained further than the order of 0.1 S/cm due to our estimates in nanofiber diameter and per cable amount as well as the possibility of nonuniform conductivity along the CB length (lines 522-525).

      Reviewer #2 (Recommendations for The Authors):

      Figure 4A: Please add scale- and color bar.

      Done - new Fig. 4 included with colors bars for topography and phase. The inset of Fig. 4A denotes a 200 nm scale bar (and that scale is now mentioned in the figure caption)

      Figure 5: A time series graph might be more instructive.

      Done - we indeed appreciate this suggestion and find that it improved the clarity of Figure 5. An inset has been included in Figure 5 plotting the resistance R change over time under different conditions. This inset demonstrates that the resistance of the cable on the IDA was slowly decreasing in the N2/H2 anaerobic chamber, only to start increasing upon exposure to ambient air.

      After putting the cable back into the chamber, the resistance again decreased over time.

    1. Author Response:

      We thank the reviewers for their insightful feedback. In our revised version of the manuscript, we will address all points raised.

      Regarding the preprocessing (Reviewer 1), we agree that the StandardRat pipeline is optimal for newly acquired datasets. However, since this study involves reanalyzing an already published dataset (Ionescu et al., JNM, 2023), which was preprocessed, analyzed, and published before the StandardRat paper, we aimed to maintain the same preprocessing. This approach allows for consistent interpretation of the readout regarding functional and molecular connectivity in the context of our previously published findings. Nonetheless, we agree that providing full access to the data will enable other researchers to reproduce our results using the StandardRat preprocessing pipeline and perform additional analyses on this rich dataset. Therefore, we will provide full access to the data via an open repository, as the reviewer suggested.

      Regarding anesthesia, we acknowledge that this is a limitation of our study, as more recent studies have indicated superior protocols. However, we and others have shown that, while not ideal, isoflurane at the used dose maintains stable physiology and does not cause burst suppression in rats. We will amend our discussion to reflect these points.

      Regarding the other points, we will amend the manuscript to provide more detail on the experimental design, including the tracer application as suggested by Reviewer 2, and clarify parts of the analysis that are unclear in the current version. Additionally, we agree with Reviewer 2 that our current terminology may cause confusion, and we will amend it accordingly. We will also discuss the other points raised by the reviewers, such as the reduced sample size for the pharmacological cohort as limitations in our discussion.

      Thank you for your understanding and the opportunity to improve our manuscript.

    1. Author response:

      Reviewer #1 (Public Review):  

      Weaknesses:  

      The weakness of this study lies in the fact that many of the genomic datasets originated from novel methods that were not validated with orthogonal approaches, such as DNA-FISH. Therefore, the detailed correlations described in this work are based on methodologies whose efficacy is not clearly established. Specifically, the authors utilized two modified protocols of TSA-seq for the detection of NADs (MKI67IP TSA-seq) and LADs (LMNB1-TSA-seq). Although these methods have been described in a bioRxiv manuscript by Kumar et al., they have not yet been published. Moreover, and surprisingly, Kumar et al., work is not cited in the current manuscript, despite its use of all TSA-seq data for NADs and LADs across the four cell lines. Moreover, Kumar et al. did not provide any DNA-FISH validation for their methods. Therefore, the interesting correlations described in this work are not based on robust technologies.    

      An attempt to validate the data was made for SON-TSA-seq of human foreskin fibroblasts (HFF) using multiplexed FISH data from IMR90 fibroblasts (from the lung) by the Zhuang lab (Su et al., 2020). However, the comparability of these datasets is questionable. It might have been more reasonable for the authors to conduct their analyses in IMR90 cells, thereby allowing them to utilize MERFISH data for validating the TSA-seq method and also for mapping NADs and LADs. 

      We disagree with the statement that the TSA-seq approach and data has not been validated by orthogonal approaches and with the conclusion that the TSA-seq approach is not robust as summarized here and detailed below in “Specific Comments”.  TSA-seq is robust because it is based only on the original immunostaining specificity provided by the primary and secondary antibodies plus the diffusion properties of the tyramide-free radical. TSA-seq has been extensively validated by microscopy and by the orthogonal genomic measurements provided by LMNB1 DamID and NAD-seq.  This includes: a) the initial validation by FISH of both nuclear speckle (to an accuracy of ~50 nm) and nuclear lamina TSA-seq  and the cross-validation of nuclear lamina TSA-seq with lamin B1 DamID in a first publication (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108); b) the further validation of SON TSA-seq by FISH in a second publication ((Zhang et al, Genome Research 2021, doi:10.1101/gr.266239.120); c) the cross-validation of nucleolar TSA-seq using NAD-seq and the validation by light microscopy of the predictions of differences in the relative distributions of centromeres, nuclear speckles, and nucleoli made from nuclear speckle, nucleolar, and pericentric heterochromatin TSA-seq in the Kumar et al, bioRxiv preprint (which is in a last revision stage involving additional formatting for the journal requirements) doi:https://doi.org/10.1101/2023.10.29.564613; d) the extensive validation of nuclear speckle, LMNB1, and nucleolar TSA-seq generated in HFF human fibroblasts using published light microscopy distance measurements of hundreds of probes generated by multiplexed immuno-FISH MERFISH data (Su et al, Cell 2020, https://doi.org/10.1016/j.cell.2020.07.032), as we described for nucleolar TSA-seq in the Kumar et al, bioRxiv preprint and to some extent for LMNB1 and SON TSA-seq in the current manuscript version (see Specific Comments with attached Author response image 2).

      Reviewer 1 raised concerns regarding this FISH validation given that the HFF TSA-seq and DamID data was compared to IMR90 MERFISH measurements.  The Su et al, Cell 2020 MERFISH paper came out well after the 4D Nucleome Consortium settled on HFF as one of the two main “Tier 1” cell lines.  We reasoned that the nuclear genome organization in a second fibroblast cell line would be sufficiently similar to justify using IMR90 FISH data as a proxy for our analysis of our HFF data. Indeed, there is a high correlation between the HFF TSA-seq and distances measured by MERFISH to nuclear lamina, nucleoli, and nuclear speckles (Author response image 1).  Comparing HFF SON-TSA-seq data with published IMR90 SON TSA-seq data (Alexander et al, Mol Cell 2021, doi.org/10.1016/j.molcel.2021.03.006), the HFF SON TSA-seq versus MERFISH scatterplot is very similar to the IMR90 SON TSA-seq versus MERFISH scatterplot.  We acknowledge the validation provided by the IMR90 MERFISH is limited by the degree to which genome organization relative to nuclear locales is similar in IMR90 and HFF fibroblasts. However, the correlation between measured microscopic distances from nuclear lamina, nucleoli, and nuclear speckles and TSA-seq scores is already quite high. We anticipate the conclusions drawn from such comparisons are solid and will only become that much stronger with future comparisons within the same cell line.

      Author response image 1.

      Scatterplots showing the correlation between TSA-seq and MERFISH microscopic distances. Top: IMR90 SON TSA-seq (from Alexander et al, Mol Cell 2021) (left) and HFF SON TSA-seq (right) (x-axis) versus distance to nuclear speckles (y-axis). Bottom: HFF Lamin B1 TSA-seq (x-axis) versus distance to nuclear lamina (y-axis) (left) and HFF MKI67IP (nucleolar) TSA-seq (x-axis) versus distance to nucleolus (y-axis) (right).

      In our revision, we will add justification of the use of IMR90 fibroblasts as a proxy for HFF fibroblasts through comparison of available data sets. 

      Reviewer #2 (Public Review):  

      Weaknesses:  

      The experiments are largely descriptive, and it is difficult to draw many cause-and-effect relationships. Similarly, the paper would be very much strengthened if the authors provided additional summary statements and interpretation of their results (especially for those not as familiar with 3D genome organization). The study would benefit from a clear and specific hypothesis.

      We acknowledge that this study was hypothesis-generating rather than hypothesis-testing in its goal. This research was funded through the NIH 4D-Nucleome Consortium, which had as its initial goal the development, benchmarking, and validation of new genomic technologies.  Our Center focused on the mapping of the genome relative to different nuclear locales and the correlation of this intranuclear positioning of the genome with functions- specifically gene expression and DNA replication timing. By its very nature, this project has taken a discovery-driven versus hypothesis-driven scientific approach.  Our question fundamentally was whether we could gain new insights into nuclear genome organization through the integration of genomic and microscopic measurements of chromosome positioning relative to multiple different nuclear compartments/bodies and their correlation with functional assays such as RNA-seq and Repli-seq.

      Indeed, as described in this manuscript, this study resulted in multiple new insights into nuclear genome organization as summarized in our last main figure.  We believe our work and conclusions will be of general interest to scientists working in the fields of 3D genome organization and nuclear cell biology.  We anticipate that each of these new insights will prompt future hypothesis-driven science focused on specific questions and the testing of cause-and-effect relationships. 

      Given the extensive scope of this manuscript, we were limited in the extent that we could describe and summarize the background, data, analysis, and significance for every new insight. In our editing to reach the eLife recommended word count, we removed some of the explanations and summaries that we had originally included. 

      As suggested by Reviewer 2, in our revision we will add back additional summary and interpretation statements to help readers unfamiliar with 3D genome organization.

      Specific Comments in response to Reviewer 1:

      (1)  We disagree with the comment that TSA-seq has not been cross-validated by other orthogonal genomic methods.  In the first TSA-seq paper (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108), we showed a good correlation between the identification of iLADs and LADs by nuclear lamin and nuclear speckle TSA-seq and the orthogonal genomic method of lamin B1 DamID, which is reproduced using our new TSA-seq 2.0 protocol in this manuscript.  Similarly, in the Kumar et al, bioRxiv preprint (doi:https://doi.org/10.1101/2023.10.29.564613), we showed a general agreement between the identification of NADs by nucleolar TSA-seq and the orthogonal genomic method of NAD-seq.  (We expect this preprint to be in press soon; it is now undergoing a last revision involving only reformatting for journal requirements.) Additionally, we also showed a high correlation between Hi-C compartments and subcompartments and TSA-seq in the Chen et al, JCB 2018 paper. Specifically, there is an excellent correlation between the A1 Hi-C subcompartment and Speckle Associated Domains as detected by nuclear speckle TSA-seq.  Additionally, the A2 Hi-C subcompartment correlated well with iLAD regions with intermediate nuclear speckle TSA-seq scores, and the B2 and B3 Hi-C subcompartments with LADs detected by both LMNB TSA-seq and LMNB1 DamID.  More generally, Hi-C A and B compartment identity correlated well with predictions of iLADs versus LADs from nuclear speckle and nuclear lamina TSA-seq.

      (2)  In the Chen et al, JCB 2018 paper we also qualitatively and quantitatively validated TSA-seq using FISH.  Qualitatively, we showed that both nuclear speckle and nuclear lamin TSA-seq correlated well with distances to nuclear speckles versus the nuclear lamina, respectively, measured by immuno-FISH.

      Quantitatively, we showed that SON TSA-seq could be used to estimate the microscopic mean distance to nuclear speckles with mean and median residuals of ~50 nm.  First, we used light microscopy to show that the spreading of tyramide-biotin signal from a point-source of TSA staining fits well with the exponential decay predicted theoretically by reaction-diffusion equations assuming a steady rate of tyramide-biotin free radical generation by the HRP enzyme and a constant probability throughout the nucleus of free-radical quenching (through reaction with protein tyrosine residues and nucleic acids).  Second, we used the exponential decay constant measured by light microscopy together with FISH measurements of mean speckle distance for several genomic regions to fit an exponential function and to predict distance to nuclear speckles genome-wide directly from SON TSA-seq sequencing reads.  Third, we used this approach to test the predictions against a new set of FISH measurements, demonstrating an accuracy of these predictions of ~50 nm.

      (3)  The importance of the quantitative validation by immuno-FISH of using TSA-seq to estimate mean distance to nuclear speckles is that it demonstrates the robustness of the TSA-seq approach.  Specifically, it shows how the TSA-seq signal is predicted to depend only on the specificity of the primary and secondary antibody staining and the diffusion properties of the tyramide-biotin free radicals produced by the HRP peroxidase.  This is fundamentally different from the significant dependence on antibodies and choice of marker proteins for molecular proximity assays such as DamID, ChIP-seq, and Cut and Run/Tag which depend on molecular proximity for labeling and/or pulldown of DNA.

      This robustness leads to specific predictions.  First, it predicts similar TSA-seq signals will be produced using antibodies against different marker proteins against the same nuclear compartment.  This is because the exponential decay constant (distance at which the signal drops by one half) for the spreading of the TSA is in the range of several hundred nm, as measured by light microscopy for several TSA staining conditions.  Indeed, we showed in the Chen et al, JCB 2018 paper that antibodies against two different nuclear speckle proteins produced very similar TSA-seq signals while antibodies against LMNB versus LMNA also produced very similar TSA-seq signals.  Similarly, we showed in the Kumar et al preprint that antibodies against four different nucleolar proteins showed similar TSA-seq signals, with the highest correlation coefficients for the TSA-seq signals produced by the antibodies against two GC nucleolar marker proteins and the TSA-seq signals produced by the antibodies against two FC/DFC nucleolar marker proteins.

      Author response image 2.

      Comparison of TSA-seq data from different cell lines versus IMR90 MERFISH.  The observed correlation between SON (nuclear speckle) TSA-seq versus MERFISH is nearly as high for TSA-seq data from HFF as it is for TSA-seq data from the IMR90 cell line (Alexander et al, Mol Cell 2021) in which the MERFISH was performed. The correlations for SON, LMNB1 (nuclear lamina) and MKI67IP (nucleolus) versus MERFISH are highest for HFF TSA-seq data as compared to TSA-seq data from other cell lines (H1, K562, HCT116).  Comparison of measured distances to nuclear locale (y-axis) versus TSA-seq scores (x-axis) from different cell lines labeled in red. Left to right: SON, LMNB1, and MKI67IP.  Top to bottom: SON TSA-seq versus MERFISH for two TSA-seq replicates; TSA-seq from HFF, H1, K562, and HCT116 versus MERFISH.

      Second, it predicts that the quantitative relationship between TSA-seq signal and mean distance from a nuclear compartment will depend on the convolution of the predicted exponential decay of spreading of the TSA signal produced by a point source with the more complicated staining distribution of nuclear compartments such as the nuclear lamina or nucleoli.  We successfully used this concept to explain the differences emerging between LMNB1 DamID and TSA-seq signals for flat nuclei and to recognize the polarized distribution of different LADs over the nuclear periphery.

      (4)  After our genomic data production and during our data analysis, a valuable resource from the Zhuang lab was published, using MERFISH to visualize hundreds of genomic loci in IMR90 cells. We acknowledge that the much more extensive validation of TSA-seq by the multiplexed immuno-FISH MERFISH data is dependent on the degree to which the nuclear genome organization is similar between IMR90 and HFF fibroblasts.  However, the correlation between distances to nuclear speckles, nucleoli, and the nuclear lamina measured in IMR90 fibroblasts and the nuclear speckle, nucleolar, and nuclear lamina TSA-seq measured in HFF fibroblasts is already striking (See Author response image 1).  With regard to SON TSA-seq, the MERFISH versus HFF TSA-seq correlation is close to what we observe using published IMR90 SON TSA-seq data (correlation coefficients of 0.89 (IMR90 TSA-seq) versus 0.86 (HFF TSA-seq).  Moreover, this correlation is highest using TSA-seq data from HFF cells as compared to the three other cell lines. (see Author response image 2).  We believe these correlations can be considered a lower bound on the actual correlations between the FISH distances and TSA-seq that we would have observed if we had performed both assays on the same cell line. 

      (5)  Currently, we still require tens of millions of cells to perform each TSA-seq assay.  This requires significant expansion of cells and a resulting increase in passage numbers of the IMR90 cells before we can perform the TSA-seq. During this expansion we observe a noticeable slowing of the IMR90 cell growth as expected for secondary cell lines as we approach the Hayflick limit.  We still do not know to what degree nuclear organization relative to nuclear locales may change as a function of cell cycle composition (ie percentage of cycling versus quiescent cells) and cell age.  Thus, even if we performed TSA-seq on IMR90 cells we would be comparing MERFISH from lower passages with a higher percentage of actively proliferating cells with TSA-seq from higher passages with a higher percentage of quiescent cells. 

      We are currently working on a new TSA-seq protocol that will work with thousands of cells.  We believe it is better investment of time and resources to wait until this new protocol is optimized before we repeat TSA-seq in IMR90 cells for a better comparison with multiplexed FISH data. 

      Specific Comments in response to Reviewer 2:

      (1)  As we acknowledge in our Response summary, we were limited in the degree to which we could actually follow-up our findings with experiments designed to test specific hypotheses generated by our data.  However, we do want to point out that our comparison of wild-type K562 cells with the LMNA/LBR double knockout was designed to test the long-standing model that nuclear lamina association of genomic loci contributes to gene silencing.  This experiment was motivated by our surprising result that gene expression differences between cell lines correlated strongly with differences in positioning relative to nuclear speckles rather than the nuclear lamina.  Despite documenting in these double knockout cells a decreased nuclear lamina association of most LADs, and an increased nuclear lamina association of the “p-w-v” fiLADs identified in this manuscript, we saw no significant change in gene expression in any of these regions as compared to wild-type K562 cells.  Meanwhile, distances to nuclear speckles as measured by TSA-seq remained nearly constant.

      We would argue that this represents a specific example in which new insights generated by our genomics comparison of cell lines led to a clear and specific hypothesis and the experimental testing of this hypothesis.

      In response to Reviewer 2, we are modifying the text to make this clearer and to explicitly describe how we were testing the hypothesis that distance to nuclear lamina is correlated with but not causally linked to gene expression and how to test this hypothesis we used a DKO of LMNA and LBR to change distances relative to the nuclear lamina and to test the effect on gene expression.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In addition to our responses to reviewer suggestions below, a minor bug in the calculation of CAIS was brought to our attention by a reader of our preprint. We have corrected this bug and rerun analyses, whose results became slightly stronger as noise was removed. While we were doing that, someone pointed out to us that our equations were almost the same as Kullback-Leibler divergence, which explains why our metric performed so well. We have made the numerically trivial (see before vs. after figure below) mathematical change to use Kullback-Leibler divergence instead, and now have a better story, with a solid basis in information theory, as to why CAIS works.

      Author response image 1.

      Unfortunately, we discovered a second bug that caused our PIC correction code to fail to perform the needed correction for phylogenetic confounding. The previously reported correlation between CAIS (or ENC) with body mass no longer survives PIC-correction. We have therefore removed this analysis from the manuscript. Our story now stands more on the theoretical basis of CAIS and ENC than on the post facto validation than it previously did. We now also present CAIS and ENC on a more equal footing. ENC results are slightly stronger, while CAIS has the complementary advantage of correcting for amino acid frequencies.

      The work involved in these changes, as well as some of the responses to reviews below, justifies changing the second author into a co-first author, and adding an additional coauthor (Hanon McShea) who discovered the second bug.

      Reviewer #1 (Public Review): 

      In this manuscript, the authors propose a new codon adaptation metric, Codon Adaptation Index of Species (CAIS), which they present as an easily obtainable proxy for effective population size. To permit between-species comparisons, they control for both amino acid frequencies and genomic GC content, which distinguishes their approach from existing ones. Having confirmed that CAIS negatively correlates with vertebrate body mass, as would be expected if small-bodied species with larger effective populations experience more efficient selection on codon usage, they then examine the relationship between CAIS and intrinsic structural disorder in proteins. 

      The idea of a robust species-level measure of codon adaptation is interesting. If CAIS is indeed a reliable proxy for the effectiveness of selection, it could be useful to analyze species without reliable life history- or mutation rate data (which will apply to many of the genomes becoming available in the near future). 

      A key question is whether CAIS, in fact, measures adaptation at the codon level. Unfortunately, CAIS is only validated indirectly by confirming a negative correlation with body mass. As a result, the observations about structural disorder are difficult to evaluate. 

      As discussed in the preamble above, we have replaced the body mass validation with a stronger theoretical basis in information theory.

      A potential problem is that differences in GC between species are not independent of life history. Effective population size can drive compositional differences due to the effects of GC-biased gene conversion (gBGC). As noted by Galtier et al. (2018), genomic GC correlates negatively with body mass in mammals and birds. It would therefore be important to examine how gBGC might affect CAIS, and to what extent it could explain the relationship between CAIS and body mass. 

      Suppose that gBGC drives an increase in GC that is most pronounced at 3rd codon positions in highrecombination regions in small-bodied species. In this case, could observed codon usage depart more strongly from expectations calculated from overall genomic GC in small vertebrates compared to large ones? The authors also report that correcting for local intergenic GC was unsuccessful, based on the lack of a significant negative relationship with body mass (Figure 3D). In principle, this could also be consistent with local GC providing a relatively more appropriate baseline in regions with high recombination rates. Considering these scenarios would clarify what exactly CAIS is capturing. 

      Figure 3 (previously Supplementary Figures S5A and S5B) shows that CAIS is negligibly correlated with %GC (not robust to multiple comparisons correction), and ENC not at all. We believe this is evidence against the possibility brought up by the reviewer, i.e. that Ne might affect gBGC (and hence global %GC). This relationship, if present, could act as a confounding effect, but it is not present within our species dataset. 

      Note that we expect our genomic-GC-based codon usage expectations to reflect unchecked gBGC in an average genomic region, independently of whether that species has high or low Ne. Our working model is that non-selective forces, include gBGC as well as conventional mutation biases, vary among species, and that they rather than selection determine each species’ genome-wide %GC. By correcting for genome-wide %GC, CAIS and ENC correct for both mutation bias and gBGC, in order to isolate the effects of selection.

      This argument, based on an average genomic region, is vulnerable to gene-rich genomic regions having differentially higher recombination rates and hence GC-biased gene conversion. However, we do not see the expected positive correlation between |𝐥𝐨𝐜𝐚𝐥 𝐆𝐂 - global GC| and CAIS (see new Figure 5), again suggesting that gene conversion strength is not a confounding factor acting on CAIS.

      Given claims about "exquisitely adapted species", the case for using CAIS as a measure of codon adaptation would also be stronger if a relationship with gene expression could be demonstrated. RSCU is expected to be higher in highly expressed genes. Is there any evidence that the equivalent GCcontrolled measure behaves similarly? 

      Correlations with gene expression are outside the scope of the current work, which is focused on producing and exploiting a single value of codon adaptation per species. It is indeed possible that our general approach of using Kullback-Leibler divergence to correct for genomic %GC could be useful in future work investigating differences among genes.  

      The manuscript is overall easy to follow, though some additional context may be helpful for the general reader. A more detailed discussion of how this work compares to the approach taken by Galtier et al. (2018), which accounted for GC content and gBGC when examining codon preferences, would be appropriate, for example. In addition, it would have been useful to mention past work that has attempted to explicitly quantify selection on codon usage. 

      One key difference between our work and that of Galtier et al. 2018 is that our approach does not rely on identifying specific codon preferences as a function of species. Our approach might therefore be robust to scenarios where different genes have different codon preferences (see Gingold et al. 2014 https://doi.org/10.1016/j.cell.2014.08.011). At a high level, our results are in broad agreement with those of Galtier et al., 2018, who found that gBGC affected all animal species, regardless of Ne, and who like us, found that the degree of selection on codon usage depended on Ne.

      Reviewer #2 (Public Review): 

      ## Summary 

      The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that controls for differences in amino acid usage and GC% across species. Using their new metric, the authors find a previously unobserved negative correlation between the overall adaptiveness of codon usage and body size across 118 vertebrates. As body size is negatively correlated with effective population size and thus the general strength of natural selection, the negative correlation between CAIS and body size is expected. The authors argue this was previously unobserved due to failures of other popular metrics such as Codon Adaptation Index (CAI) and the Effective Number of Codons (ENC) to adequately control for differences in amino acid usage and GC content across species. Most surprisingly, the authors also find a positive relationship between CAIS and the overall "disorderedness" of a species protein domains. As some of these results are unexpected, which is acknowledged by the authors, I think it would be particularly beneficial to work with some simulated datasets. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection $sN_e$ when the mutation bias changes across species.  

      ## Strengths 

      (1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance (see Cope et al. Biochemica et Biophysica Acta - Biomembranes 2018 for a clear example of this). 

      We now cite Cope et al. as an example of how amino acid composition can act as a confounding factor.

      (2) The authors present numerous analysis using both ENC and mean CAI as a comparison to CAIS, helping given a sense of how CAIS corrects for some of the issues with these other metrics. I also enjoyed that they examined the previously unobserved relationship between codon usage bias and body size, which has bugged me ever since I saw Kessler and Dean 2014. The result comparing protein disorder to CAIS was particularly interesting and unexpected. 

      Unfortunately, our previous PIC correction code was buggy, and in fact the relationship with body size does not survive PIC correction (although it is strong prior to PIC correction). We have therefore removed it from the paper. However, the more novel result on protein disorder remains strong.

      (3) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences. 

      ## Weaknesses 

      (1) The main weakness of this work is that it lacks simulated data to confirm that it works as expected. This would be particularly useful for assessing the relationship between CAIS and the overall effect of protein structure disorder, which the authors acknowledge is an unexpected result. I think simulations could also allow the authors to assess how their metric performs in situations where mutation bias and natural selection act in the same direction vs. opposite directions. Additionally, although I appreciate their comparisons to ENC and mean CAI, the lack of comparison to other popular codon metrics for calculating the overall adaptiveness of a genome (e.g. dos Reis et al.'s $S$ statistic, which is a function of tRNA Adaptation Index (tAI) and ENC) may be more appropriate. Even if results are similar to $S$, CAIS has a noted advantage that it doesn't require identifying tRNA gene copy numbers or abundances, which I think are generally less readily available than genomic GC% and protein-coding sequences. 

      The main limitation of dos Reis’s test in our view is that, like the better versions of CAI, it requires comparable orthologs across species. See also the discussion below re the benefits of proteome-wide approach. We now also note the advantage of not needing tRNA gene copy numbers and abundances. 

      Simulated datasets would be great, but we think it a nice addition rather than must-have, in particular because we are skeptical about whether our understanding of all relevant processes is good enough such that simulations would add much to our more heuristic argument along the lines of Figure 2. E.g. the complications of Gingold et al. 2014 cited above are pertinent, but incorporating them would make simulations quite involved. Instead, we now have a stronger theoretical justification for CAIS grounded in information theory. We have significantly expanded discussion of Figure 2 to give a clearer idea of the conceptual underpinnings of CAIS and ENC.

      The authors mention the selection-mutation-drift equilibrium model, which underlies the basic ideas of this work (e.g. higher $N_e$ results in stronger selection on codon usage), but a more in-depth framing of CAIS in terms of this model is not given. I think this could be valuable, particularly in addressing the question "are we really estimating what we think we're estimating?" 

      Let's take a closer look at the formulation for RSCUS. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of r = genome for some species s.

      I think what the authors are attempting to do is "divide out" the effects of mutation bias (as given by $E_i$), such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represent adaptive codon usage. Consider Gilchrist et al. MBE 2015, which says that the expected frequency of codon i at selection-mutation-drift equilibrium in gene g for an amino acid with Na synonymous codons is

      where ∆M is the mutation bias, ∆η is the strength of selection scaled by the strength of drift, and φg is the gene expression level of gene g. In this case, ∆M and ∆η reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which ∆M,∆η = 0. Assuming the selection-mutation-drift equilibrium model is generally adequate to model of the true codon usage patterns in a genome (as I do and I think the authors do, too), the Ei,g could be considered the expected observed frequency codon i in gene g

      E[Oi,g].

      Let’s re-write the  in the form of Gilchrist et al., such that it is a function of mutation bias ∆M. For simplicity we will consider just the two codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term gr and 1 − gr can be written as

      where µx→y is the mutation rate from nucleotides x to y. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias .This can be expressed in terms of the equilibrium GC content by recognizing that

      As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon i at an amino acid becomes just a Bernoulli process. 

      If we do this, then 

      Recall that in the Gilchrist et al. framework, the reference codon has ∆MNNG,NNG \= 0 =⇒ e−∆MNNG,NNG \=1. Thus, we have recovered the Gilchrist et al. model from the formulation of $E_i$ under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for ∆η in equation (1).. 

      We can then calculate the expected RSCUS using equation (1) (using notation E[Oi]) and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as ). Assume in this case that NNG is the reference codon (∆MNNG,∆ηNNG \= 0).

      This shows that the expected value of RSCUS for a two-codon amino acid is expected to increase as the strength of selection $\Delta\eta$ increases, which is desired. Note that $\Delta\eta$ in Gilchrist et al. is formulated in terms of selection *against* a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If $\Delta\eta = 0$ (i.e. selection does not favor either codon), then $E[RSCUS] = 1$. Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if $sN_e$ (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content ranging around 0.41, so I suspect their results are okay. 

      Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids. 

      We thank Reviewer 2 for explicitly laying out the math that was implicit in our Figures 1 and 2. While we keep our more heuristic presentation, our revised manuscript now more clearly acknowledges that the per-site codon adaptation bias depicted in Figure 1 has limited sensitivity to s*Ne. The reason that we believe our approach worked despite this, is that we think the phenomenon is driven by what is shown in Figure 2. I.e., where Ne makes a difference is by determining the proteome-wide fraction of codons subject to significant codon adaptation, rather than by determining the strength of codon adaptation at any particular site or gene. We have made multiple changes to the texts to make this point clearer.

      Another minor weakness of this work is that although the method is generally applicable to any species with an annotated genome and the code is publicly available, the code itself contains hard-coded values for GC% and amino acid frequencies across the 118 vertebrates. The lack of a more flexible tool may make it difficult for less computationally-experienced researchers to take advantage of this method. 

      Genome-wide %GC values are hard-coded because they were taken from the previous study of James et al. (2023) https://doi.org/10.1093/molbev/msad073. As summarized in the manuscript, genome-wide %GC was a byproduct of a scan of all six reading frames across genic and intergenic sequences available from NCBI with access dates between May and July 2019. The more complicated code used to calculate the intergenic %GC, and the code used to calculate amino acid frequencies is located at https://github.com/MaselLab/CodonAdaptation-Index-of-Species. Luckily, someone else just wrote a simpler end to end pipeline for us, on the basis of our preprint. We now note this in the Acknowledgements, and link to it: https://github.com/gavinmdouglas/handy_pop_gen/blob/main/CAIS.py.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable study in which the authors provide an expression profile of the human blood fluke, Schistosoma mansoni. A strength of this solid study is in its inclusion of in situ hybridisation to validate the predictions of the transcript analysis.

      We thank the reviewers and the editor for their effort and expertise in reviewing our manuscript. We have made changes based on the reviews and believe this has greatly strengthened our manuscript. We appreciate their insightful comments and suggestions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, the authors provide a valuable transcriptomic resource for the intermediate free-living transmission stage (miracidium larva) of the blood fluke. The single-cell transcriptome inventory is beautifully supplemented with in situ hybridization, providing spatial information and absolute cell numbers for many of the recovered transcriptomic states. The identification of sex-specific transcriptomic states within the populations of stem cells was particularly unexpected. The work comprises a rich resource to complement the biology of this complex system, however falls short in some technical aspects of the bioinformatic analyses of the generated sequence data.

      (1) Four sequencing libraries were generated and then merged for analysis, however, the authors fail to document any parameters that would indicate that the clustering does not suffer from any batch effects.

      We thank the reviewer for this comment which has given us the opportunity to elaborate on this interesting point. Consequently, we have added evidence to show that the data do not suffer from batch effects between samples (e.g. between sorted samples 1 and 4, and unsorted samples 2 and 3). We now show that there are contributions to all clusters from sorted and unsorted samples and highlight the benefits to using both conditions in a cell atlas with unknown cell types.

      Accordingly, we have now added the following paragraph to line 153:

      There were contributions from sorted and unsorted samples in almost all clusters (except ciliary plates). We found that some cell/tissue types had similar recovery from both methods (e.g. Stem A, Muscle 2, and Tegument), others were preferentially recovered by sorting (e.g Neuron 1, Neuron 4, and Stem E), and some were depleted by sorting (e.g. Parenchyma 1, Protonephridia, and Ciliary plates) (Supplementary Figure 1) , Supplementary Table 4). This variation in recovery, therefore, enabled us to maximise the discovery and inclusion of different cell types in the atlas.

      We have now added a Supplementary Figure 1 showing the contribution of sorted and unsorted cells to the Seurat clusters. We have also included a Supplementary Table 4 detailing the cell number contribution for both conditions and the percentages in order to easily compare differential recovery between cell types.

      These are added to the manuscript.

      (2) Additionally, the authors switch between analysis platforms without a clear motivation or explanation of what the fundamental differences between these platforms are. While in theory, any biologically robust observation should be recoverable from any permutation of analysis parameters, it has been recently documented that the two popular analysis platforms (Seurat - R and scanPy python) indeed do things slightly differently and can give different results (https://www.biorxiv.org/content/10.1101/2024.04.04.588111v1). For this reason, I don't think that one can claim that Seurat fails to find clusters resolved by SAM without running a similar pipeline on the cluster alone as was done with SAM/scanPy here. The manuscript itself needs to be checked carefully for misleading statements in this regard.

      We thank the reviewer for this comment and agree that it’s important to increase the clarity on this matter. We have added additional detail to explain that results of subclustering Neuron 1 using Seurat and SAM/ScanPy were broadly similar, but that we presented the results from the SAM/ScanPy analysis due to the strengths of SAM in detecting small differences in gene expression (Tarashanky et al., 2019 PMID: 31524596). We have included here the UMAP showing subclustering of Neuron 1 in Seurat for comparison.

      Author response image 1.

      UMAP showing subclustering of Neuron 1 cluster in Seurat (SCT normalisation, PC = 19, resolution = 0.3).

      We’ve added this additional text to the ‘Neuron abundance and diversity’ section on line 220:

      We explored whether Neuron 1 could be further subdivided into transcriptionally distinct cells by subclustering (Supplementary Figure 2; Supplementary Table 6) using the self-assembling manifold (SAM) algorithm (Tarashansky et al., 2019) with ScanPy (Wolf et al., 2018), given its reported strength in discerning subtle variation in gene expression (Tarashansky et al., 2019), although a similar topology was subsequently found using Seurat.

      (3) Similarly, the manuscript contains many statements regarding clusters being 'connected to', or forming a 'bridge' on the UMAP projection. One must be very careful about these types of statements, as the relative position of cells on a reduced-dimension cell map can be misleading (see Chari and Pachter 2023). To support these types of interpretations, the authors should provide evidence of gene expression transitions that support connectivity as well as stability estimates of such connections under different parameter conditions. Otherwise, these descriptors hold little value and should be dropped and the transcriptomic states simply defined as clusters with no reference to their positions on the UMAP.

      We thank the reviewer for this thoughtful comment. We agree and have rephrased those statements accordingly e.g. line numbers 218, 439, 543, and 557.

      (4) The underlying support for the clusters as transcriptomically unique identities is not well supported by the dot plots provided. The authors used very permissive parameters to generate marker lists, which hampers the identification of highly specific marker genes. This permissive approach can allow for extensive lists of upregulated genes for input into STRING/GO analyses, this is less useful for evaluating the robustness of the cluster states. Running the Seurat::FindAllMarkers with more stringent parameters would give a more selective set of genes to display and thereby increase the confidence in the reader as to the validity of profiles selected as being transcriptomically unique.

      The Reviewer is correct in noting that we used a permissive approach to enable a better understanding of the biology of each cluster, based on analysing enriched functions. However, we disagree about the suitability of the approach for finding markers. First, the permissive approach produced longer candidate lists, but those with the best AUC scores for each cluster are at the top of the list for each cluster. Second, some of the markers with lower expression also revealed interesting biology (e.g. Notum in the muscles). Furthermore, we used filtering on the marker genes lists to increase the minimum marker gene scores for analyses such as the GO analyses (details in the GO section of the methods). It’s important to stress that our approach also utilised validation by FISH for top marker genes, as well as biologically informative genes that were lower down the marker gene list.

      (5) Figure 5B shows a UMAP representation of cell positions with a statement that the clustering disappears. As a visual representation of this phenomenon, the UMAP is a very good tool, however, to make this statement you need to re-cluster your data after the removal of this gene set and demonstrate that the data no longer clusters into A/B and C/D.

      We’ve added Supplementary Figure 13 to show that after removing WSR and ZSR genes and reclustering, the data no longer clusters in A/B and C/D, even at a higher resolution where clusters appear oversplit.

      Also, as a reader, these data beg the question: which genes are removed here? Is there an over-representation of any specific 'types' of genes that could lead to any hypotheses of the function? Perhaps the STRING/GO analyses of this gene set could be informative.

      We have performed GO-enrichment analyses on W-specific genes, Z-specific genes and both together compared to the rest of the genome, but we did not find very informative results (see Supplementary Table 13 that we have now added, line 464). This may be due to the large difference in size. There are approx 900 Z-specific genes (males two copy, females one copy), while approx 30 W-specific genes many of which have homologs in the Z-specific region of the genome. Instead we suggest that tissue-specific regulation of gene dosage compensation is the more likely explanation as reported for other species (Valsecchi et al. 2018).

      (6) How do the proportions of cell types characterized via in situ here compare to the relative proportions of clusters obtained? It does not correspond to the percentages of the clusters captured (although this should be quantified in a similar manner in order to make this comparison direct: 10,686/20,478 = ~50% vs. 7%), how do you interpret this discrepancy? While this is mentioned in the discussion, there is no sufficient postulation as to why you have an overabundance of the stem cells compared to their presence in the tissue. While it is true that you could have a negative selection of some cell types, for example as stated the size of the penetration glands exceeds both that of the 10x capabilities (40uM), and the 30uM filters used in the protocol, this does not really address why over half of the captured cells represent 'stem cells'. A more realistic interpretation would be biological rather than merely technical. For example, while the composition of the muscle cells and the number of muscle transcriptomes captured are quite congruent at ~20%, the organism is composed of more than 50% of neurons, but only 15% of the transcriptomic states are assigned to neuronal. Could it be that a large fraction of the stem cells are actually neural progenitors? Are there other large inconsistencies between the cluster sizes and the fraction of expected cells? Could you look specifically at early transcription factors that are found in the neurons (or other cell types) within the various stem cell populations to help further refine the precursor/cell type relationships?

      Yes, it is really interesting that more than 50% of cells in the animal are neurons whereas more than 50% of cells in scRNAseq data are stem cells. This dataset provides a unique opportunity to compare tissue composition in the whole animal to the corresponding single cell RNAseq dataset.

      The table (in Supplementary Table 17) shows the percentage of cells from each tissue type in the miracidium (identified via in situ hybridisation of tissue-type marker genes) and in the scRNAseq to understand this phenomenon.

      This table shows that the single cell protocol used in this study negatively selected for nerves and tegument, and positively selected for stem and parenchyma. The composition of the muscle and protonephridia cells and the number of muscle and protonephridia transcriptomes captured are quite congruent.

      This technical finding is also biologically consistent. For instance, the tegument cells span the body wall muscles, with the cell bodies below and a syncytial layer above. It is not known how the tegument fragments during the dissociation process, and which parts of the cells get packaged by the 10X GEMs. Because of tegumental structure, the cells are likely prone to damage, and therefore we speculate that is why the tegument cells are under-represented in our 10X data. Unusually shaped fragments may not have been captured in 10X GEMs and of those that were, damaged or distressed tegument cells/fragments may have been excluded post-sequencing, by QC filters including cell calling, mitochondrial percentage and low transcript count (e.g. if there there was a tegumental fragment with 100 transcripts it would have not passed QC). Stem cells are spherical with a large nucleus:cytoplasm ratio, likely making them more robust during dissociation and more likely to be captured in 10X GEMs.

      We don’t think that a large fraction of the stem cells are actually neural progenitors because:

      (1) we used previously reported marker genes of different tissue types to identify the single cell RNAseq clusters, e.g. Ago2-1 for stem cells, which has been used in multiple life stages.

      (2) The stem cell transcriptomes express many previously reported stem cell marker genes.

      (3) We found that the stem cells from the single cell data generally had higher numbers of transcripts than the other cell types which is consistent with the Wang et al. 2013 observation that RNA marker POPO-1 could distinguish germinal (stem) cells from other cell types as they are RNA rich.

      (4) We also found higher numbers of ribosomal related transcripts in our stem cell transcriptomes, which is consistent with Pan’s observation that part of the distinct morphology of stem cells is densely packed ribosomes in the cytoplasm.

      In order to elaborate on this discussion we have generated new visualisations:

      (1) A UMAP of the stem cell marker ago2-1 (Supplementary figure 10), to further illustrate our evidence in classifying the stem cell clusters

      (2) A co-expression plot of the stem cell marker ago2-1 with neural marker complexin to confirm that there is little coexpression (the most coexpression being in Neuron 1 and Stem F). We identified that 15.56% of cells in the Stem F cluster show some expression of complexin (neural marker), suggesting that a small fraction of Stem F may be early/precursor neurons, but the gene expression indicates that the majority of cells in Stem F are more likely to be stem cells than any other tissue type. There is little to no complexin expression in the other stem clusters.

      (3) Expression plots of the 5 neurogenins (TFs involved in neuronal differentiation) we could identify using WormBase ParaSite in these data. Four of the five showed very little expression, and not in specific clusters. The fifth (Smp_072470) showed slightly more expression, though still sparse, mostly across the stem and neural clusters not enough to indicate that any of the stem clusters are neural progenitors.

      Author response image 2.

      Coexpression UMAP showing the expression of stem cell marker Ago2-1 and neural marker complexin.

      Author response image 3.

      UMAPs showing the expression five putative neurogenins of S.mansoni.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript the authors have generated a single-cell atlas of the miracidium, the first free-living stage of an important human parasite, Schistosoma mansoni. Miracidia develop from eggs produced in the mammalian (human) host and are released into freshwater, where they can infect the parasite's intermediate snail host to continue the life cycle. This study adds to the growing single-cell resources that have already been generated for other life-cycle stages and, thus, provides a useful resource for the field.

      Strengths:

      Beyond generating lists of genes that are differentially expressed in different cell types, the authors validated many of the cluster-defining genes using in situ hybridization chain reaction. In addition to providing the field with markers for many of the cell types in the parasite at this stage, the authors use these markers to count the total number of various cell types in the organism. Because the authors realized that their cell isolation protocols were biasing the cell types they were sequencing, they applied a second method to help them recover additional cell types.

      Schistosomes have ZW sex chromosomes and the authors make the interesting observation that the stem cells at this stage are already expressing sex (i.e. W)-specific genes.

      Weaknesses:

      The sample sizes upon which the in situ hybridization results and cell counts are based are either not stated (in most cases) or are very small (n=3). This lack of clarity about biological replicates and sample sizes makes it difficult for the reader to assess the robustness of the results and the extremely small sample sizes (when provided) are a missed opportunity to explore the variability of the system, or lack thereof.

      We have now added more details about the methods we used for validating cell type marker genes by in situ hybridisation. We have added to the methods that ‘We carried out at least three in situ hybridisation experiments for each marker gene we validated (each experiment was a biological replicate). From each experiment we imaged (by confocal microscopy) at least 10 miracidia (technical replicates) per marker gene experiment.’ on line 1036.

      In the figure legends we have added the number of miracidia that were screened, and documented the percentage of the screened larvae that showed the in situ gene expression pattern that is seen in the images in the figures, and that we described in the text.

      We manually segmented the nuclei of pan tissue marker genes, and we did this for one miracidium in the case of all tissues, except stem cells where we segmented stem cells in five larvae. Manual segmentation of gene expression in a confocal z-stack is very time consuming. We consider that the variability of different cell and tissue types (stereotypy) between miracidia is beyond the scope of this paper and can be investigated in future work.

      Although assigning transcripts to a given cell type is usually straightforward via in situ experiments, the authors fail to consider the potential difficulty of assigning the appropriate nuclei to cells with long cytoplasmic extensions, like neurons. In the absence of multiple markers and a better understanding of the nervous system, it seems likely that the authors have overestimated the number of neurons and misassigned other cell types based on their proximity to neural projections.

      This is a valid point, and we acknowledge the difficulties of assigning a nucleus to a cell using mRNA expression only and in the absence of a cell membrane marker. We tried to address this issue by labelling the cell membranes using an antibody against beta catenin after the HCR in situ protocol. This method has been used successfully on sections on slides (Schulte et al., 2024), but we failed to get usable results in our miracidia whole-mounts. The beta catenin localisation marked the membranes of the gland cells but didn’t do the same for the neurons or other cell types (see image below).

      Author response image 4.

      Image showing a maximum intensity projection of a subvolume of a confocal z-stack of a miracidia wholemount in situ hybridisation (by HCR) for paramyosin counterstained with a beta catenin antibody (1:600 concentration of Sigma C2206). The cell membrane of a lateral gland is clearly labelled, but those of the neurons of the brain and the paramyosin+ muscle cells are not.

      Our observation that 57% of the cells in a miracidium are nerves is high compared to the C.elegans hermaphrodite adult in which 302 out of 959 cells are neurons (Hobert et al., 2016), few studies have equivalent data with which to make comparisons. Despite this, and the limitation described above, we believe that we have not overestimated the number of neural cells. During the process of validating the marker genes and closely examining gene expression in hundreds of miracidia, we noted that the nuclei of different tissue types are distinct and recognisable (see figure below). The nuclei of stem, tegument and parenchymal cells are comparatively large and spherical with obvious nucleoli (i). The four nuclei of the apical gland cell are angular, pentagonal in shape and sit adjoining each other (inside red dashed circle, i-iii), those of the two lateral glands are bilaterally symmetrical and surrounded by flask shaped cytoplasm (arrows, iv). The nuclei of the body wall muscle cells are peripheral and flattened on the outer edge (iii). The notum+ muscle cell nuclei are anterior of the apical gland (manuscript Figure 2E). The only other two tissue types are the nerves and protonephridia, and their nuclei are smaller and more compact/condensed. In situ expression of the protonephridia marker suggests that 6 cells make up the protonephridial system (manuscript Figure 4 B&E). Therefore, by process of elimination, the remaining nuclei should belong to neurons. The complexin expression pattern supports this and we counted 209 nuclei that were surrounded by cpx transcript expression. To help the reader interpret this for themselves we have added confocal z-stacks of miracidia where tissue level markers have been multiplexed (supplementary videos 18-20). We counted all tissue type cells individually and the tissue type cell numbers added up to the overall cell count.

      Author response image 5.

      Image showing the diversity of nucleus morphology between tissue types in the miracidium.

      Biologically, it is not surprising that this larva is dominated by neural cells. It must navigate a complex aquatic environment and identify a suitable mollusc host in less than 12 hours. It is a non-feeding vehicle that must deliver the stem cells to a suitable environment where they can develop into the subsequent life cycle stage. Accordingly, the cell type composition reflects this challenge.

      The conclusion that germline genes are expressed in the miracidia stem cells seems greatly overstated in the absence of any follow-up validation. The expression scales for genes like eled and boule are more than 3 orders of magnitude smaller than those used for any of the robustly expressed genes presented throughout the paper. These scales are undefined, so it isn't entirely clear what they represent, but neither of these genes is detected at levels remotely high (or statistically significant) enough to survive filters for cluster-defining genes.

      Given that germ cells often develop early in embryogenesis and arrest the cell cycle until later in development, and that these transcripts reveal no unspliced forms, it seems plausible that the authors are detecting some maternally supplied transcripts that have yet to be completely degraded.

      We agree that the expression of genes such as eled and boule are low. We made this clear in the figure legends and text, and have now added scale information to the figure legends. We did not explore these genes as cluster-defining genes, partly due to their comparatively low levels of expression, but as genes already reported to be important in germ line specification. We found the expression of these genes to be consistent with our hypothesis that the Kappa stem cells may include germ line segregated cells, but our hypothesis does not rest on these lower-expressed genes.

      It is certainly possible that we have detected some maternally supplied transcripts in the miracidia stem cells. However experiments to distinguish between zygotic and maternal transcripts using metabolic labelling of zygotic transcripts (e.g. Fishman et al. 2023) would be hard in this species due to the hard egg capsule and its ectolethical embryogenesis. Therefore this is out of scope for this work, but this would be a very interesting topic to follow up on and develop tools for.

      We have added these sentences to the Discussion ln 746 ‘Intriguingly, the presence of spliced-only copies of the germline defining genes eled and boule could suggest that they are maternal transcripts that have been restricted to the primordial germ cells during embryogenesis, as is the case in Zebrafish embryos (Fishman et al., 2023). An alternative explanation is that unspliced transcripts exist for these lowly expressed genes but their abundance was below our threshold for detection.’

      Reviewer #1 (Recommendations For The Authors):

      Ln 138: specify the version of Seurat used, and reference the primary papers for this software. Also, from the dot plot shown here, these do not all appear to be supported by unique gene sets. How was the final clustering determined? This information is in the methods section, but a summary here could make it more robust for the readership.

      In addition to the details in the methods section, we have added the version and referenced the version-specific primary paper for Seurat when it is first mentioned. We have also summarised the methods used to select the final clustering when we first present the results to aid in clarity.

      We added to line 140 ‘Using Seurat (version 4.3.0) (Hao et al., 2021), 19 distinct clusters of cells were identified, along with putative marker genes best able to discriminate between the populations (Figure 1C & D and Supplementary Table 2 and 3). We used Seurat’s JackStraw and ElbowPlot, along with molecular cross-validation to select the number of principal components, and Seurat’s clustree to select a resolution where clusters were stable (Hao et al., 2021).’

      Ln 147: isn't seven stem cell clusters a lot? See comment in public review.

      We did not have preconceived expectations of the number of stem cell clusters, and were guided by the data and gene expression. In doing so we also discovered that four of those clusters were likely only two ‘biologically or functionally distinct’ clusters, but these split into four clusters based on the expression of genes on the sex-specific regions of the chromosomes, which was both unexpected and interesting.

      Figure 1D: gene model names are un-informative for the general reader. Can you provide any putative gene identities here to render this plot interpretable? For example in the main text you state that Smp-085540 is paramyosin; please use this annotation in all your visual material (as is used in Figure 2A).

      We have added gene names to the dotplots in all figures with the locus identifier (minus the ‘Smp’ prefix) in brackets after the gene name.

      Ln 191:196 Identification of the two muscle clusters as circular and longitudinal muscles is very well supported. However, it would be interesting to look specifically at the genes that are different here. Did the authors attempt to specifically pull out genes differentially expressed between these two groups, or only examine the output of FindAllMarkers at this point?

      We did indeed look specifically for genes differentially expressed between the muscle clusters, the results of which can be found in Supplementary Table 5 (Line 206). This analysis revealed “Wnt-11-1 (circular) and MyoD (longitudinal) were among the most differentially expressed genes”, which were important findings in our understanding of the muscle cells in the miracidium.

      Ln 207: "connected to stem F" - does this refer specifically to their relative positions on the UMAP in Figure 1C? One must be very careful about these types of statements, as the relative position of cells on a reduced-dimension cell map can be misleading (public review).

      We agree, and have rephrased accordingly.

      Ln 209:211: Here the authors switch from Seurat (R) as an analysis package, to SAM (python) for subset analysis of one large neural cluster. The results indicate that there may be small populations of transcriptomically distinct neural subtypes also within the neural1 cluster, but that the vast majority of these cells do not express unique transcriptomic profiles. Also in the supplementary material for this (SF1) there is a question of whether or not there is any clustering according to batch effects.

      In general, I find the neuronal section a little difficult to follow and it is unclear how many unique profiles are present and which are documented with in situ. I would recommend re-running the analysis on the entire neural subset (n1:5: complexin positive) and generating an inventory of putatively unique neural states with the associated in situ validation altogether in a main figure.

      In response to comments above we have both clarified our reasoning for using SAM analysis, and presented more details on possible batch effects. We have gone through the neural system results in order to make it clearer for the reader to follow.

      Ln 236: here the authors introduce a STRING analysis for the first time. Also, this method requires some introduction for the general audience in terms of its goals and general functionality and output.

      We used STRING analysis on some well defined clusters to provide additional clues about function. At the first mention of STRING (neuron 3 results) we have added the following statement to give more introduction to the reader: “STRING analysis of the top 100 markers of Neuron 3 predicted two protein interaction networks with functional enrichment: ….”

      Ln. 280:281. It is unclear why Steger et al is referenced here. In what way does a description of neural and glandular cell transcriptomic similarity in a Cnidarian inform your data on a member of the playhelmenthes? (which should also be referenced in the introduction: to which phylogenetic lineage does Schistosoma belong).

      We have now added that the Schistosoma belong to the Platyhelminths on the first line of the introduction.

      Ln 295 we have added ‘We expected to find a discrete cluster(s) for the penetration glands, and that it would show similarities to the neural clusters (as glandular cells arise from neuroglandular precursor cells in other animals, such as the sea anemone, Nematostella vectensis, Steger et al., 2022).’

      Ln 339: explain the motivation for generating a further plate-based scRNA of the ciliary plates.

      We wished to include the ciliary plates alongside the gland cells for plate based RNAseq as they are unique to the miracidium stage and wanted to make sure we had captured them in this study.

      Ln 345: Define the tegumental cells for the general reader.

      We have added further description on tegument cells in the introduction and tegument results section, e.g. on line 61, 366).

      Ln 365: "this cluster" is imprecise. Which cluster are we looking at here?' Also: were flame cells already described morphologically at this stage, or is this the first description of the protonephridial system for this stage of the life cycle?

      We have now clarified which cluster we are talking about in the text. The flame cells have been described using TEM before (Pan, 1980).

      Stem Cells: also here you refer to cells as 'bridge' which refers to the configuration of the UMAP. While this is likely a biological representation of a different differentiation state, the nomination of this based solely on the UMAP representation should be avoided.

      We have rephrased this.

      Figure 5B: What is neuron 6? This was Neuron 3 in Figure 1.

      Thank you for spotting these mistakes in the labelling, we have corrected them now.

      Ln 421:438 - Here you represent a UMAP representation of the cell positions, but state that the clustering disappears. See comment in Public Review.

      Modified accordingly, see response in public review.

      Ln 472 "Cells in stem E, F, and G in silico clusters might be stressed/damaged/dying cells or cells in transcriptionally transitional states." Is there any evidence supporting either of these conclusions?

      We found that 15.56% of the cells in Stem F expressed the neural marker complexin, leading us to consider the possibility that a fraction of these cells may be neural precursors. Stem F also had some cells with a mitochondrial % near the maximum threshold we set, suggesting they could be experiencing some stress. Since we could not identify clear markers for these clusters, their function and a more specific identity, beyond ‘stem’, is not yet known.

      That the two stem cell populations contribute to different parts of the next life cycle stage is interesting. The combined analysis suffers from the same issues as the previous analysis in terms of sample distribution; are the 'grey' sporocyst cells also contributing to the stem A/B (kappa) C/D (delta/phi) clusters? This is not possible to tell from the plot as the miracidia may simply be plotted on the top. A different representation of sample contribution to clusters is warranted.

      We have made an alternative visualisation here to demonstrate that the miracidia cells are not plotted on top of the sporocyst stem cells. Unfortunately this visual is hampered as there is not a straightforward way to split the panels. In the figure below, the left pane shows the miracidia cells, and the right pane shows the sporocyst cells. Below that, we have included the original figure for comparison. It can be clearly seen that there are three miracidia tegument cells in the sporocyst tegument cluster, and one sporocyst cell in the miracidia stem cells (Stem E), but the miracidia A/B and C/D stem cells are not plotted on top of any sporocyst cells.

      Author response image 6.

      Methods: Why is the multiplet rate estimate at >50% for the unsorted sample?

      We have added more detail on this: “The estimated doublet rate was calculated based on 10X loading guidelines and adjusted for our sample concentrations”.

      Reviewer #2 (Recommendations For The Authors):

      (1) The manuscript would benefit from a more careful consideration of what was already known based on previous literature, which would help the authors to better put their results in context. For example, previous work suggested that one of the sporocyst stem cell populations (phi) gives rise to tegument and other temporary larval structures; this appears not to be mentioned here. The model in Figure 7 suggests that two of the stem cell populations are gone at day 15 post-infection; the literature shows that those cells can still be detected at this stage (there are just far fewer of them).

      We have added the definition of Kappa, Delta and Phi as per Wang et al (2018) in the stem cell results p13 ln 428.

      We have amended Figure 7 to include further elements from the Wang et al (2018) paper that show that mother sporocyst stem cells classified as delta and phi are still detectable on day 15 post-infection in mother sporocysts.

      We intentionally didn’t put too much emphasis on fitting our data to the model of Wang et al (2018), because a) it’s a different life cycle stage and b) the single cell data the model was based on was from 35 stem cells and gathered using a different method, c) more recent data (Diaz, Attenborough et al. 2024) with 119 stem cells from sporocysts did not recover the same populations of stem cells. We therefore linked our data to previous literature where it was relevant but focused on being led by the data we gathered (>10,000 stem cells).

      (2) To add some detail to the public comment about the lack of clarity about sample sizes and biological replicates, and how this leads to questions about the robustness of the results, Figures 4 B and F show the expression pattern for the same parenchyma marker (Smp_318890) in two different samples. The patterns appear quite distinctive. In B, the cell bodies are so clearly labeled that the signal appears oversaturated. In F the cell bodies are barely apparent. Based on the single-cell clustering, it should be possible to distinguish between Parenchyma clusters 1 and 2 based on the levels of this transcript. Careful quantification of signal intensity from multiple samples across multiple experiments might enable the authors to detect such differences.

      The reason the expression patterns look different between panels 4Bii and 4F is that in 4Bii we have manually segmented the nuclei of the parenchymal cells in order to count them, whereas in the images in 4F there is no segmentation. We have made this more clear in this legend now, and also in the legends of Figures 2,3, and 5. If there was any signal intensity difference between parenchyma 1 and 2 cells based on expression of the marker gene, Smp_318890, it was not obvious. We carried out 6 experiments for parenchyma markers, multiplexing the pan-parenchyma marker, Smp_318890, with markers for parenchyma 2 but we were unable to distinguish between the two populations.

      (3) The authors find that the "somatic" stem cells in miracidia seem to combine attributes of the previously defined delta and phi stem cells from sporocysts. Because the 3 classes of sporocyst stem cells were defined by expression of nanos-2 and fgfrA, using those probes in in-situ experiments could have helped them resolve whether or not the miracidial cells represent precursors that can adopt either fate or if the heterogeneity is already present in miracidia.

      In silico expression of the marker genes for the 3 classes of sporocyst stem cells didn’t support those three classes in the miracidia stem cells (See supplementary table 10). We further subclustered the delta/phi cells to see if we could recover separate delta and phi populations but we were unable to do so. We therefore did not pursue in situ experiments of these genes. We instead prioritised cluster-defining genes in the miracidia stem cell populations rather than cluster defining genes in the sporocyst (defined by Wang et al., 2018), but we still explored these in silico. For example, instead of using klf to define Kappa (Wang et al 2018), we used UPPA to validate the Kappa population as it showed similar expression to klf but higher expression levels and was specific to that population. However, like Wang et al 2018, we did use p53, which is a cluster marker of delta and phi in sporocysts, as it showed clear and high expression in our miracidia delta/phi population. We were guided by our data and our knowledge of the literature. More in depth single cell RNAseq is needed from the mother and daughter sporocyst stages to understand the heterogeneity and fates of these stem populations.

      (4) Scale bars should be included throughout the figures and the scale should be defined either on the figure or in the legend. Similarly, all the scales used for velocity and expression analysis should be defined.

      We have added scale bars to all figures and legends.

      The statements “Gene expression has been log-normalised and scaled using Seurat(v. 4.3.0)”, “Gene expression has been normalised (CPM) and log-transformed using scvelo(v. 0.2.4)”, or “Library size was normalised and gene expression values were log-normalised using SAM (v1.0.1) and Scanpy (v1.8.2)” has been added to all figures as appropriate.

      (5) The table entitled In situ hybridization probes (Supplementary Table 15) contains no probe sequences, so any interested reader wishing to use these probes would have to design their own. To ensure the reproducibility of the results presented here, the authors should provide the probe sequences they used.

      In Supplementary Table 15 we have added the Molecular Instruments Lot number of all the probes used. Anyone wanting to repeat the experiment can order the same probes from the company.

      (6) It is unclear how useful the supplemental figures showing the STRING enrichment analyses will be for readers. Unannotated Smp gene identifiers provide no way to help readers digest the information in these hairballs. It would probably be best to replace the Smp names with useful annotations based on their orthologs; if not, these figures could probably be dropped entirely. (Also, the bottom panel of Supplementary Figure 7 has the word "Lorem" embedded on one of the connecting nodes.)

      “Lorem” has been removed.

      Many of the genes in these analyses do not have short descriptions, therefore we have used Smp gene identifiers in the STRING analysis supplementary figures. These ‘Smp_’ numbers can be used to search WormBase Parasite, where a description can be found and the history of the gene ID traced. This latter function facilitates searching for these genes in the literature and consistency between versions as gene models are updated.

      Minor edits

      (1) Figures 4A-D aren't cited in the text until after 4E-F are. It seems like moving the section on protonephridial cells (line 364) before the section on tegumental cells (line 345) better reflects the order of the figures.

      Thank you for flagging this, we have updated the in-text citations of Figure 4.

      (2) In-text references to Sarfati et al, 2021 should be to Nanes Sarfati, as listed in the references. Poteaux et al 2023 is cited in the text, but not in the reference list.

      Both of these have been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Point 1.1

      Summary: This paper describes a reanalysis of data collected by Gagne et al. (2020), who investigated how human choice behaviour differs in response to changes in environmental volatility. Several studies to date have demonstrated that individuals appear to increase their learning rate in response to greater volatility and that this adjustment is reduced amongst individuals with anxiety and depression. The present authors challenge this view and instead describe a novel Mixture of Strategies (MOS) model, that attributes individual differences in choice behaviour to different weightings of three distinct decision-making strategies. They demonstrate that the MOS model provides a superior fit to the data and that the previously observed differences between patients and healthy controls may be explained by patients opting for a less cognitively demanding, but suboptimal, strategy. 

      Strengths: 

      The authors compare several models (including the original winning model in Gagne et al., 2020) that could feasibly fit the data. These are clearly described and are evaluated using a range of model diagnostics. The proposed MOS model appears to provide a superior fit across several tests. 

      The MOS model output is easy to interpret and has good face validity. This allows for the generation of clear, testable, hypotheses, and the authors have suggested several lines of potential research based on this. 

      We appreciate the efforts in understanding our manuscript. This is a good summary.

      Point 1.2

      The authors justify this reanalysis by arguing that learning rate adjustment (which has previously been used to explain choice behaviour on volatility tasks) is likely to be too computationally expensive and therefore unfeasible. It is unclear how to determine how "expensive" learning rate adjustment is, and how this compares to the proposed MOS model (which also includes learning rate parameters), which combines estimates across three distinct decision-making strategies. 

      We are sorry for this confusion. Actually, our motivation is that previous models only consider the possibility of learning rate adaptation to different levels of environmental volatility. The drawback of previous computational modeling is that they require a large number of parameters in multi-context experiments. We feel that learning rate adaptation may not be the only mechanisms or at least there may exist alternative explanations. Understanding the true mechanisms is particularly important for rehabilitation purposes especially in our case of anxiety and depression. To clarify, we have removed all claims about the learning rate adaptation is “too complex to understand”.

      Point 1.3

      As highlighted by the authors, the model is limited in its explanation of previously observed learning differences based on outcome value. It's currently unclear why there would be a change in learning across positive/negative outcome contexts, based on strategy choice alone. 

      Thanks for mentioning this limitation. We want to highlight two aspect of work.

      First, we developed the MOS6 model primarily to account for the learning rate differences between stable and volatile contexts, and between healthy controls and patients, not for between positive and negative outcomes. In the other words, our model does not eliminate the possibility of different learning rate in positive and negative outcomes.

      Second, Figure 3A shows that FLR (containing different learning parameters for positive/negative outcomes) even performed worse than MOS6 (setting identical learning rate for positive/negative outcomes). This result question whether learning rate differences between positive/negative outcomes exist in our dataset.

      Action: We now include this limitation in lines 784-793 in discussion:

      “The MOS model is developed to offer context-free interpretations for the learning rate differences observed both between stable and volatile contexts and between healthy individuals and patients. However, we also recognize that the MOS account may not justify other learning rate effects based solely on strategy preferences. One such example is the valence-specific learning rate differences, where learning rates for better-than-expected outcomes are higher than those for worse-than-expected outcomes (Gagne et al., 2020). When fitted to the behavioral data, the context-dependent MOS22 model does not reveal valence-specific learning rates (Supplemental Note 4). Moreover, the valence-specific effect was not replicated in the FLR22 model when fitted to the synthesized data of MOS6.”

      Point 1.4

      Overall the methods are clearly presented and easy to follow, but lack clarity regarding some key features of the reversal learning task.

      Throughout the method the stimuli are referred to as "right" and "left". It's not uncommon in reversal learning tasks for the stimuli to change sides on a trial-by-trial basis or counterbalanced across stable/volatile blocks and participants. It is not stated in the methods whether the shapes were indeed kept on the same side throughout. If this is the case, please state it. If it was not (and the shapes did change sides throughout the task) this may have important implications for the interpretation of the results. In particular, the weighting of the habitual strategy (within the Mixture of Strategies model) could be very noisy, as participants could potentially have been habitual in choosing the same side (i.e., performing the same motor movement), or in choosing the same shape. Does the MOS model account for this? 

      We are sorry for the confusion. Yes, two shapes indeed changed sides throughout the task. We replaced the “left” and “right” with “stimulus 1” and “stimulus 2”. We also acknowledge the possibility that participants may develop a habitual preference for a particular side, rather than a shape. Due to the counterbalance design, habitual on side will introduce a random selection noise in choices, which should be captured by the MOS model through the inverse temperature parameter.  

      Point 1.5

      Line 164: "Participants received points or money in the reward condition and an electric shock in the punishment condition." What determined whether participants received points or money, and did this differ across participants? 

      Thanks! We have the design clarified in lines 187-188:

      “Each participant was instructed to complete two blocks of the volatile reversal learning task, one in the reward context and the other in the aversive context”,

      and in lines:

      “A total of 79 participants completed tasks in both feedback contexts. Four participants only completed the task in the reward context, while three participants only completed the aversive task.”

      Point 1.6

      Line 167: "The participant received feedback only after choosing the correct stimulus and received nothing else" Is this correct? In Figure 1a it appears the participant receives feedback irrespective of the stimulus they chose, by either being shown the amount 1-99 they are being rewarded/shocked, or 0. Additionally, what does the "correct stimulus" refer to across the two feedback conditions? It seems intuitive that in the reward version, the correct answer would be the rewarding stimulus - in the loss version is the "correct" answer the one where they are not receiving a shock? 

      Thanks for raising this issue. We removed the term “correct stimulus” and revised the lines 162-166 accordingly:

      “Only one of the two stimuli was associated with actual feedback (0 for the other one). The feedback magnitude, ranged between 1-99, is sampled uniformly and independently for each shape from trial to trial. Actual feedback was delivered only if the stimulus associated with feedback was chosen; otherwise, a number “0” was displayed on the screen, signifying that the chosen stimulus returns nothing.”

      Point 1.7

      Line 176: "The whole experiment included two runs each for the two feedback conditions." Does this mean participants completed the stable and volatile blocks twice, for each feedback condition? (i.e., 8 blocks total, 4 per feedback condition). 

      Thanks! We have removed the term “block”, and now we refer to it as “context”. In particular, we removed phrases like “stable block” and “volatile block” and used “context” instead.

      Action: See lines 187-189 for the revised version.

      “Each participant was instructed to complete two runs of the volatile reversal learning task, one in the reward context and the other in the aversive context. Each run consisted of 180 trials, with 90 trials in the stable context and 90 in the volatile context (Fig. 1B).”

      Point 1.8

      In the expected utility (EU) strategy of the Mixture or Strategies model, the expected value of the stimulus on each trial is produced by multiplying the magnitude and probability of reward/shock. In Gagne et al.'s original paper, they found that an additive mixture of these components better-captured participant choice behaviour - why did the authors not opt for the same strategy here? 

      Thanks for asking this. Their strategy basic means the mixture of PF+MO+HA, where PF stands for the feedback probability (e.g., 0.3 or 0.7) without multiplying feedback magnitude. However, ours are EU+MO+HA, where EU stands for feedback probability x feedback magnitude. We did compare these two strategies and the model using their strategy performed much worse than ours (see the red box below).

      Author response image 1.

      Thorough model comparison.

      Point 1.9

      How did the authors account for individuals with poor/inattentive responding, my concern is that the habitual strategy may be capturing participants who did not adhere to the task (or is this impossible to differentiate?). 

      The current MOS6 model distinguishes between the HA strategy and the inattentive response. Due to the counterbalance design, the HA strategy requires participants to actively track the stimuli on the screen. In contrast, the inattentive responding, like the same motor movement mentioned in Point 1.4, should exhibit random selection in their behavioral data, which should be account by the inverse temperature parameter.

      Point 1.10

      The authors provide a clear rationale for, and description of, each of the computational models used to capture participant choice behaviour. 

      • Did the authors compare different combinations of strategies within the MOS model (e.g., only including one or two strategies at a time, and comparing fit?) I think more explanation is needed as to why the authors opted for those three specific strategies. 

      We appreciate this great advice. Following your advice, we conducted a thorough model comparisons. Please refer to Figure R1 above. The detailed text descriptions of all the models in Figure R1 are included in Supplemental Note 1.

      Point 1.11

      Please report the mean and variability of each of the strategy weights, per group. 

      Thanks. We updated the mean of variability of the strategies in lines 490-503:

      “We first focused on the fitted parameters of the MOS6 model. We compared the weight parameters (, , ) across groups and conducted statistical tests on their logits (, , ). The patient group showed a ~37% preference towards the EU strategy, which is significantly weaker than the ~50% preference in healthy controls (healthy controls’ : M = 0.991, SD = 1.416; patients’ : M = 0.196, SD = 1.736; t(54.948) = 2.162, p = 0.035, Cohen’s d = 0.509; Fig. 4A). Meanwhile, the patients exhibited a weaker preference (~27%) for the HA strategy compared to healthy controls (~36%) (healthy controls’ : M = 0.657,  SD = 1.313; patients’ : M = -0.162, SD = 1.561; t(56.311) = 2.455, p = 0.017, Cohen’s d = 0.574), but a stronger preference for the MO strategy (36% vs. 14%; healthy controls’ : M = -1.647,  SD = 1.930; patients’ : M = -0.034, SD = 2.091; t(63.746) = -3.510, p = 0.001, Cohen’s d = 0.801). Most importantly, we also examined the learning rate parameter in the MOS6 but found no group differences (t(68.692) = 0.690, p = 0.493, Cohen’s d = 0.151). These results strongly suggest that the differences in decision strategy preferences can account for the learning behaviors in the two groups without necessitating any differences in learning rate per se.”

      Point 1.12

      The authors compare the strategy weights of patients and controls and conclude that patients favour more simpler strategies (see Line 417), based on the fact that they had higher weights for the MO, and lower on the EU.

      (1) However, the finding that control participants were more likely to use the habitual strategy was largely ignored. Within the control group, were the participants significantly more likely to opt for the EU strategy, over the HA? 2) Further, on line 467 the authors state "Additionally, there was a significant correlation between symptom severity and the preference for the HA strategy (Pearson's r = -0.285, p = 0.007)." Apologies if I'm mistaken, but does this negative correlation not mean that the greater the symptoms, the less likely they were to use the habitual strategy?

      I think more nuance is needed in the interpretation of these results, particularly in the discussion. 

      Thanks. The healthy participants seemed more likely to opt for the EU strategy, although this difference did not reach significance (paired-t(53) = 1.258, p = 0.214, Cohen’s d = 0.242). We systematically explore the role of HA. Compared to the MO, the HA saves cognitive resources but yields a significantly higher hit rate (Fig. 4A). Therefore, a preference for the HA over the MO strategy may reflect a more sophisticated balance between reward and complexity within an agent: when healthier subjects run out of cognitive resources for the EU strategy, they will cleverly resort to the HA strategy, adopting a simpler strategy but still achieving a certain level of hit rate. This explains the negative symptom-HA correlation. As clever as the HA strategy is, it is not surprising that the health control participants opt more for the HA during decision-making.

      However, we are cautious to draw strong conclusion on (1) non-significant difference between EU and HA within health controls and (2) the negative symptom-HA correlation. The reason is that the MOS22, the context-dependent variant, 1) exhibited a significant higher preference for EU over HA (paired-t(53) = 4.070, p < 0.001, Cohen’s d = 0.825) and 2) did not replicate this negative correlation (Supplemental Information Figure S3).

      Action: Simulation analysis on the effects of HA was introduced in lines 556-595 and Figure 4. We discussed the effects of HA in lines 721-733:

      “Although many observed behavioral differences can be explained by a shift in preference from the EU to the MO strategy among patients, we also explore the potential effects of the HA strategy. Compared to the MO, the HA strategy also saves cognitive resources but yields a significantly higher hit rate (Fig. 4A). Therefore, a preference for the HA over the MO strategy may reflect a more sophisticated balance between reward and complexity within an agent (Gershman, 2020): when healthier participants exhaust their cognitive resources for the EU strategy, they may cleverly resort to the HA strategy, adopting a simpler strategy but still achieving a certain level of hit rate. This explains the stronger preference for the HA strategy in the HC group (Fig. 3A) and the negative correlation between HA preferences and symptom severity  (Fig. 5). Apart from shedding light on the cognitive impairments of patients, the inclusion of the HA strategy significantly enhances the model’s fit to human behavior (see examples in Daw et al. (2011); Gershman (2020); and also Supplemental Note 1 and Supplemental Figure S3).”

      Point 1.13

      Line 513: "their preference for the slowest decision strategy" - why is the MO considered the slowest strategy? Is it not the least cognitively demanding, and therefore, the quickest? 

      Sorry for the confusion. In Fig. 5C, we conducted simulations to estimate the learning speed for each strategy. As shown below, the MO strategy exhibits a flat learning curve. Our claim on the learning speed was based solely on simulation outcomes without referring to cognitive demands. Note that our analysis did not aim to compare the cognitive demands of the MO and HA strategies directly.

      Action: We explain the learning speed of the three strategies in lines 571-581.

      Point 1.14

      The authors argue that participants chose suboptimal strategies, but do not actually report task performance. How does strategy choice relate to the performance on the task (in terms of number of rewards/shocks)? Did healthy controls actually perform any better than the patient group? 

      Thanks for the suggestion. The answers are: 1) EU is the most rewarding > the HA > the MO (Fig. 5A), and 2) yes healthy controls did actually perform better than patients in terms of hit rate (Fig. 2).

      Action: We included additional sections on above analyses in lines 561-570 and lines 397-401.

      Point 1.15

      The authors speculate that Gagne et al. (2020) did not study the relationship between the decision process and anxiety and depression, because it was too complex to analyse. It's unclear why the FLR model would be too complex to analyse. My understanding is that the focus of Gagne's paper was on learning rate (rather than noise or risk preference) due to this being the main previous finding. 

      Thanks! Yes, our previous arguments are vague and confusing. We have removed all this kind of arguments.

      Point 1.16

      Minor Comments: 

      • Line 392: Modeling fitting > Model fitting 

      • Line 580 reads "The MO and HA are simpler heuristic strategies that are cognitively demanding."

      - should this read as less cognitively demanding? 

      • Line 517: health > healthy 

      • Line 816: Desnity > density 

      Sorry for the typo! They have all been fixed.

      Reviewer #2:

      Point 2.1

      Summary: Previous research shows that humans tend to adjust learning in environments where stimulus-outcome contingencies become more volatile. This learning rate adaptation is impaired in some psychiatric disorders, such as depression and anxiety. In this study, the authors reanalyze previously published data on a reversal-learning task with two volatility levels. Through a new model, they provide some evidence for an alternative explanation whereby the learning rate adaptation is driven by different decision-making strategies and not learning deficits. In particular, they propose that adjusting learning can be explained by deviations from the optimal decision-making strategy (based on maximizing expected utility) due to response stickiness or focus on reward magnitude. Furthermore, a factor related to the general psychopathology of individuals with anxiety and depression negatively correlated with the weight on the optimal strategy and response stickiness, while it correlated positively with the magnitude strategy (a strategy that ignores the probability of outcome). 

      Thanks for evaluating our paper. This is a good summary.

      Point 2.2

      My main concern is that the winning model (MOS6) does not have an error term (inverse temperature parameter beta is fixed to 8.804). 

      (1) It is not clear why the beta is not estimated and how were the values presented here chosen. It is reported as being an average value but it is not clear from which parameter estimation. Furthermore, with an average value for participants that would have lower values of inverse temperature (more stochastic behaviour) the model is likely overfitting.

      (2) In the absence of a noise parameter, the model will have to classify behaviour that is not explained by the optimal strategy (where participants simply did not pay attention or were not motivated) as being due to one of the other two strategies.

      We apologize for any confusion caused by our writing. We did set the inverse temperature as a free parameter and quantitatively estimate it during the model fitting and comparison. We also created a table to show the free parameters for each models. In the previous manuscript, we did mention “temperature parameter beta is fixed to 8.804”, but only for the model simulation part, which is conducted to interpret some model behaviors.

      We agree with the concern that using the averaged value over the inverse temperature could lead to overfitting to more stochastic behaviors. To mitigate this issue, we now used the median as a more representative value for the population during simulation. Nonetheless, this change does not affect our conclusion (see simulation results in Figures 4&6).

      Action: We now use the term “free parameter” to emphasize that the inverse temperature was fitted rather than fixed. We also create a new table “Table 1”  in line 458 to show all the free parameters within a model. We also update the simulation details in lines 363-391 for more clarifications.

      Point 2.3

      (3) A model comparison among models with inverse temperature and variable subsets of the three strategies (EU + MO, EU + HA) would be interesting to see. Similarly, comparison of the MOS6 model to other models where the inverse temperature parameter is fixed to 8.804).

      This is an important limitation because the same simulation as with the MOS model in Figure 3b can be achieved by a more parsimonious (but less interesting) manipulation of the inverse temperature parameter.

      Thanks, we added a comparison between the MOS6 and the two lesion models (EU + MO, EU + HA). Please refer to the figure below and Point 1.8.

      We also realize that the MO strategy could exhibit averaged learning curves similar to random selection. To confirm that patients' slower learning rates are due to a preference for the MO strategy, we compared the MOS6 model with a variant (see the red box below) in which the MO strategy is replaced by Random (RD) selection that assigns a 0.5 probability to both choices. This comparison showed that the original MOS6 model with the MO strategy better fits human data.

      Author response image 2.

      Point 2.4

      Furthermore, the claim that the EU represents an optimal strategy is a bit overstated. The EU strategy is the only one of the three that assumes participants learn about the stimulus-outcomes contingencies. Higher EU strategy utilisation will include participants that are more optimal (in maximum utility maximisation terms), but also those that just learned better and completely ignored the reward magnitude.

      Thank you for your feedback. We have now revised the paper to remove all statement about “EU strategy is the optimal” and replaced by “EU strategy is rewarding but complex”. We agree that both the EU strategy and the strategy only focusing on feedback probability (i.e., ignoring the reward magnitude, refer to as the PF strategy) are rewarding but complex beyond two simple heuristics. We also included the later strategy in our model comparisons (see the next section Point 2.5).

      Point 2.5

      The mixture strategies model is an interesting proposal, but seems to be a very convoluted way to ask: to what degree are decisions of subjects affected by reward, what they've learned, and response stickiness? It seems to me that the same set of questions could be addressed with a simpler model that would define choice decisions through a softmax with a linear combination of the difference in rewards, the difference in probabilities, and a stickiness parameter. 

      Thanks for suggesting this model. We did include the proposed linear combination models (see “linear comb.” in the red box below) and found that it performed significantly worse than the MOS6.

      Action: We justified our model selection criterion in the Supplemental Note 1.

      Author response image 3.

      Point 2.6

      Learning rate adaptation was also shown with tasks where decision-making strategies play a less important role, such as the Predictive Inference task (see for instance Nassar et al, 2010). When discussing the merit of the findings of this study on learning rate adaptation across volatility blocks, this work would be essential to mention. 

      Thanks for mentioning this great experimental paradigm, which provides an ideal solution for disassociating the probability learning and decision process. We have discussed about this paradigm as well as the associated papers in discussion lines 749-751, 763-765, and 796-801.

      Point 2.7

      Minor mistakes that I've noticed:

      Equation 6: The learning rate for response stickiness is sometimes defined as alpha_AH or alpha_pi.

      Supplementary material (SM) Contents are lacking in Note1. SM talks about model MOS18, but it is not defined in the text (I am assuming it is MOS22 that should be talked about here).

      Thanks! Fixed.

      Reviewer #3:

      Point 3.1

      Summary: This paper presents a new formulation of a computational model of adaptive learning amid environmental volatility. Using a behavioral paradigm and data set made available by the authors of an earlier publication (Gagne et al., 2020), the new model is found to fit the data well. The model's structure consists of three weighted controllers that influence decisions on the basis of (1) expected utility, (2) potential outcome magnitude, and (3) habit. The model offers an interpretation of psychopathology-related individual differences in decision-making behavior in terms of differences in the relative weighting of the three controllers.

      Strengths: The newly proposed "mixture of strategies" (MOS) model is evaluated relative to the model presented in the original paper by Gagne et al., 2020 (here called the "flexible learning rate" or FLR model) and two other models. Appropriate and sophisticated methods are used for developing, parameterizing, fitting, and assessing the MOS model, and the MOS model performs well on multiple goodness-of-fit indices. The parameters of the model show decent recoverability and offer a novel interpretation for psychopathology-related individual differences. Most remarkably, the model seems to be able to account for apparent differences in behavioral learning rates between high-volatility and low-volatility conditions even with no true condition-dependent change in the parameters of its learning/decision processes. This finding calls into question a class of existing models that attribute behavioral adaptation to adaptive learning rates. 

      Thanks for evaluating our paper. This is a good summary.

      Point 3.2<br /> (1) Some aspects of the paper, especially in the methods section, lacked clarity or seemed to assume context that had not been presented. I found it necessary to set the paper down and read Gagne et al., 2020 in order to understand it properly.

      (3) Clarification-related suggestions for the methods section: <br /> - Explain earlier that there are 4 contexts (reward/shock crossed with high/low volatility). Lines 252-307 contain a number of references to parameters being fit separately per context, but "context" was previously used only to refer to the two volatility levels. 

      Action: We have placed the explanation as well as the table about the 4 contexts (stable-reward/stable-aversive/volatile-reward/volatile-aversive) earlier in the section that introduces the experiment paradigm (lines 177-186):

      “Participants was supposed to complete this learning and decision-making task in four experimental contexts (Fig. 1A), two feedback contexts (reward or aversive)  two volatility contexts (stable or volatile). Participants received points in the reward context and an electric shock in the aversive context. The reward points in the reward context were converted into a monetary bonus by the end of the task, ranging from £0 to £10. In the stable context, the dominant stimulus (i.e., a certain stimulus induces the feedback with a higher probability) provided a feedback with a fixed probability of 0.75, while the other one yielded a feedback with a probability of 0.25. In the volatile context, the dominant stimulus’s feedback probability was 0.8, but the dominant stimulus switched between the two every 20 trials. Hence, this design required participants to actively learn and infer the changing stimulus-feedback contingency in the volatile context.”

      - It would be helpful to provide an initial outline of the four models that will be described since the FLR, RS, and PH models were not foreshadowed in the introduction. For the FLR model in particular, it would be helpful to give a narrative overview of the components of the model before presenting the notation. 

      Action: We now include an overview paragraph in the section of computation model to outline the four models as well as the hypotheses constituted in the model (lines 202-220).  

      - The subsection on line 343, describing the simulations, lacks context. There are references to three effects being simulated (and to "the remaining two effects") but these are unclear because there's no statement in this section of what the three effects are.

      - Lines 352-353 give group-specific weighting parameters used for the stimulations of the HC and PAT groups in Figure 4B. A third, non-group-specific set of weighting parameters is given above on lines 348-349. What were those used for?

      - Line 352 seems to say Figure 4A is plotting a simulation, but the figure caption seems to say it is plotting empirical data. 

      These paragraphs has been rewritten and the abovementioned issues have been clarified. See lines 363-392.

      Point 3.2

      (2) There is little examination of why the MOS model does so well in terms of model fit indices. What features of the data is it doing a better job of capturing? One thing that makes this puzzling is that the MOS and FLR models seem to have most of the same qualitative components: the FLR model has parameters for additive weighting of magnitude relative to probability (akin to the MOS model's magnitude-only strategy weight) and for an autocorrelative choice kernel (akin to the MOS model's habit strategy weight). So it's not self-evident where the MOS model's advantage is coming from.

      An intuitive understanding of the FLR model is that it estimates the stimuli value through a linear combination of probability feedback (PF, )and (non-linear) magnitude .See equation:

      Also, the FLR model include the mechanisms of HA as:

      In other words, FLR model considers the mechanisms about the probability of feedback (PF)+MO+HA (see Eq. XX in the original study), but our MOS considers the mechanisms of EU+MO+HA. The key qualitative difference lies between FLR and MOS is the usage of the expected utility formula (EU) instead the probability of feedback (PF). The advantage of our MOS model has been fully evidenced by our model comparisons, indicating that human participants multiply probability and magnitude rather than only considering probability. The EU strategy has also been suggested by a large pile of literature (Gershman et al., 2015; Von Neumann & Morgenstern, 1947).

      Making decisions based on the multiplication of feedback probability and magnitude can often yield very different results compared to decisions based on a linear combination of the two, especially when the two magnitudes have a small absolute difference but a large ratio. Let’s consider two cases:

      (1) Stimulus 1: vs. Stimulus 2:

      (2) Stimulus 1: vs. Stimulus 2:

      The EU strategy may opt for stimulus 2 in both cases, since stimulus 2 always has a larger expected value. However, it is very likely for the PF+MO to choose stimulus 1 in the first case. For example, when .  If we want the PF+MO to also choose stimulus to align with the EU strategy, we need to increase the weight on magnitude . Note that in this example we divided the magnitude value by 100 to ensure that probability and magnitude are on the same scale to help illustration.

      In the dataset reported by Gagne, 2020, the described scenario seems to occur more often in the aversive context than in the reward context. To accurately capture human behaviors, FLR22 model requires a significantly larger weight for magnitude in the aversive context than in the reward context . Interestingly, when the weights for magnitude in different contexts are forced to be equal, the model (FLR6) fails, exhibiting an almost chance-level performance throughout learning (Fig. 3E, G). In contrast, the MOS6 model, and even the RS3 model, exhibit good performance using one identical set of parameters across contexts. Both MOS6 and RS3 include the EU strategy during decision-making. These findings suggest humans make decisions using the EU strategy rather than PF+MO.

      The focus of our paper is to present that a good-enough model can interpret the same dataset in a completely different perspective, not necessarily to explore improvements for the FLR model.

      Point 3.3

      One of the paper's potentially most noteworthy findings (Figure 5) is that when the FLR model is fit to synthetic data generated by the expected utility (EU) controller with a fixed learning rate, it recovers a spurious difference in learning rate between the volatile and stable environments. Although this is potentially a significant finding, its interpretation seems uncertain for several reasons: 

      - According to the relevant methods text, the result is based on a simulation of only 5 task blocks for each strategy. It would be better to repeat the simulation and recovery multiple times so that a confidence interval or error bar can be estimated and added to the figure. 

      - It makes sense that learning rates recovered for the magnitude-oriented (MO) strategy are near zero, since behavior simulated by that strategy would have no reason to show any evidence of learning. But this makes it perplexing why the MO learning rate in the volatile condition is slightly positive and slightly greater than in the stable condition. 

      - The pure-EU and pure-MO strategies are interpreted as being analogous to the healthy control group and the patient group, respectively. However, the actual difference in estimated EU/MO weighting between the two participant groups was much more moderate. It's unclear whether the same result would be obtained for a more empirically plausible difference in EU/MO weighting. 

      - The fits of the FLR model to the simulated data "controlled all parameters except for the learning rate parameters across the two strategies" (line 522). If this means that no parameters except learning rate were allowed to differ between the fits to the pure-EU and pure-MO synthetic data sets, the models would have been prevented from fitting the difference in terms of the relative weighting of probability and magnitude, which better corresponds to the true difference between the two strategies. This could have interfered with the estimation of other parameters, such as learning rate. 

      - If, after addressing all of the above, the FLR model really does recover a spurious difference in learning rate between stable and volatile blocks, it would be worth more examination of why this is happening. For example, is it because there are more opportunities to observe learning in those blocks?

      I would recommend performing a version of the Figure 5 simulations using two sets of MOS-model parameters that are identical except that they use healthy-control-like and patient-like values of the EU and MO weights (similar to the parameters described on lines 346-353, though perhaps with the habit controller weight equated). Then fit the simulated data with the FLR model, with learning rate and other parameters free to differ between groups. The result would be informative as to (1) whether the FLR model still misidentifies between-group strategy differences as learning rate differences, and (2) whether the FLR model still identifies spurious learning rate differences between stable and volatile conditions in the control-like group, which become attenuated in the patient-like group. 

      Many thanks for this great advice. Following your suggestions, we now conduct simulations using the median of the fitted parameters. The representations for healthy controls and patients have identical parameters, except for the three preference parameters; moreover, the habit weights are not controlled to be equal. 20 simulations for each representative, each comprising 4 task sequences sampled from the behavioral data. In this case, we could create error bars and perform statistical tests. We found that the differences in learning rates between stable and volatile conditions, as well as the learning rate adaptation differences between healthy controls and patients, still persisted.

      Combined with the discussion in Point 3.2, we justify why a mixture-of-strategy can account for learning rate adaptation as follow. Due to (unknown) differences in task sequences, the MOS6 model exhibits more MO-like behaviors due to the usage of the EU strategy. To capture this behavior pattern, the FLR22 model has to increase its weighting parameter 1-λ for magnitude, which could ultimately drive the FLR22 to adjust the fitted learning rate parameters, exhibiting a learning rate adaptation effect. Our simulations suggest that estimating learning rate just by model fitting may not be the only way to interpret the data.

      Action: We included the simulation details in the method section (lines 381-lines 391)

      “In one simulated experiment, we sampled the four task sequences from the real data. We simulated 20 experiments with the parameters of to mimic the behavior of the healthy control participants. The first three are the median of the fitted parameters across all participants; the latter three were chosen to approximate the strategy preferences of real health control participants (Figure 4A). Similarly, we also simulated 20 experiments for the patient group with the identical values of , and , but different strategy preferences   . In other words, the only difference in the parameters of the two groups is the switched and . We then fitted the FLR22 to the behavioral data generated by the MOS6 and examined the learning rate differences across groups and volatile contexts (Fig. 6). ”

      Point 3.4

      Figure 4C shows that the habit-only strategy is able to learn and adapt to changing contingencies, and some of the interpretive discussion emphasizes this. (For instance, line 651 says the habit strategy brings more rewards than the MO strategy.) However, the habit strategy doesn't seem to have any mechanism for learning from outcome feedback. It seems unlikely it would perform better than chance if it were the sole driver of behavior. Is it succeeding in this example because it is learning from previous decisions made by the EU strategy, or perhaps from decisions in the empirical data?

      Yes, the intuition is that the HA strategy seems to show no learning mechanism. But in reality, it yields a higher hit rate than MO by simply learning from previous decisions made by the EU strategy. We run simulations to confirm this (Figure 4B).

      Point 3.5

      For the model recovery analysis (line 567), the stated purpose is to rule out the possibility that the MOS model always wins (line 552), but the only result presented is one in which the MOS model wins. To assess whether the MOS and FLR models can be differentiated, it seems necessary also to show model recovery results for synthetic data generated by the FLR model. 

      Sure, we conducted a model recovery analysis that include all models, and it demonstrates that MOS and FLR can be fully differentiated. The results of the new model recovery analysis were shown in Fig. 7.

      Point 3.6

      To the best of my understanding, the MOS model seems to implement valence-specific learning rates in a qualitatively different way from how they were implemented in Gagne et al., 2020, and other previous literature. Line 246 says there were separate learning rates for upward and downward updates to the outcome probability. That's different from using two learning rates for "better"- and "worse"-than-expected outcomes, which will depend on both the direction of the update and the valence of the outcome (reward or shock). Might this relate to why no evidence for valence-specific learning rates was found even though the original authors found such evidence in the same data set? 

      Thanks. Following the suggestion, we have corrected our implementation of valence-specific learning rate in all models (see lines 261-268).

      “To keep consistent with Gagne et al., (2020), we also explored the valence-specific learning rate,

      is the learning rate for better-than-expected outcome, and for worse-than-expected outcome. It is important to note that Eq. 6 was only applied to the reward context, and the definitions of “better-than-expected” and “worse-than-expected” should change accordingly in the aversive context, where we defined for and for .

      No main effect of valence on learning rate was found (see Supplemental Information Note 3)

      Point 3.7

      The discussion (line 649) foregrounds the finding of greater "magnitude-only" weights with greater "general factor" psychopathology scores, concluding it reflects a shift toward simplifying heuristics. However, the picture might not be so straightforward because "habit" weights, which also reflect a simplifying heuristic, correlated negatively with the psychopathology scores. 

      Thanks. In contrast the detrimental effects of “MO”, “habit” is actually beneficial for the task. Please refer to Point 1.12.

      Point 3.8

      The discussion section contains some pejorative-sounding comments about Gagne et al. 2020 that lack clear justification. Line 611 says that the study "did not attempt to connect the decision process to anxiety and depression traits." Given that linking model-derived learning rate estimates to psychopathology scores was a major topic of the study, this broad statement seems incorrect. If the intent is to describe a more specific step that was not undertaken in that paper, please clarify. Likewise, I don't understand the justification for the statement on line 615 that the model from that paper "is not understandable" - please use more precise and neutral language to describe the model's perceived shortcomings. 

      Sorry for the confusion. We have removed all abovementioned pejorative-sounding comments.

      Point 3.9

      4. Minor suggestions: 

      - Line 114 says people with psychiatric illness "are known to have shrunk cognitive resources" - this phrasing comes across as somewhat loaded. 

      Thanks. We have removed this argument.

      - Line 225, I don't think the reference to "hot hand bias" is correct. I understand hot hand bias to mean overestimating the probability of success after past successes. That's not the same thing as habitual repetition of previous responses, which is what's being discussed here. 

      Response: Thanks for mentioning this. We have removed all discussions about “hot hand bias”.

      - There may be some notational inconsistency if alpha_pi on line 248 and alpha_HA on line 253 are referring to the same thing. 

      Thanks! Fixed!

      - Check the notation on line 285 - there may be some interchanging of decimals and commas.

      Thanks! Fixed!

      Also, would the interpretation in terms of risk seeking and risk aversion be different for rewarding versus aversive outcomes? 

      Thanks for asking. If we understand it correctly, risk seeking and risk aversion mechanisms are only present in the RS models, which show clearly worse fitting performance. We thus decide not to overly interpret the fitted parameters in the RS models.

      - Line 501, "HA and PAT groups" looks like a typo. 

      - In Figure 5, better graphical labeling of the panels and axes would be helpful. 

      Response: Thanks! Fixed!

      REFERENCES

      Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans' choices and striatal prediction errors. Neuron, 69(6), 1204-1215.

      Gagne, C., Zika, O., Dayan, P., & Bishop, S. J. (2020). Impaired adaptation of learning to contingency volatility in internalizing psychopathology. Elife, 9.

      Gershman, S. J. (2020). Origin of perseveration in the trade-off between reward and complexity. Cognition, 204, 104394.

      Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245), 273-278.

      Von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior, 2nd rev.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:

      The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:

      The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

      The use of a single short genetic marker (the RdRp palmprint region) from coronaviruses is indeed a limitation. However, this marker is the one that is currently used for routinely delimiting operational taxonomic units in RNA viruses and reconstructing their evolutionary history (Edgar et al. 2022, see also the Serratus project; https://serratus.io/); therefore, we took the conscious decision early on to rely on this expertise. Unfortunately, this marker cannot provide robust timescale reconstructions for coronavirus evolution (previous estimates of coronavirus origin range from around 10 thousand years ago to 293 million years ago depending on modeling assumptions). Only future genomic work across Coronaviridae that will characterize multiple genetic regions with different evolutionary rates will allow us to precisely elucidate the timescale of the evolutionary history of coronaviruses alongside their hosts. In the meantime, we show here that, while the RdRp palmprint region cannot by itself resolve the precise timescale of coronavirus evolution, it strongly suggests, when used along with cophylogenetic approaches, a recent evolutionary origin in bats.

      We now further discuss these issues and the perspectives offered by future genomic work on lines 462-485.  

      Reviewer #2 (Public Review):

      Summary:

      In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.

      The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through hostswitching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.

      In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:

      The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:

      Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.

      Our intuition is that ALE in its “dated” version does not necessarily fail on our dataset due to its size: ALE runs, but it provides unrealistic parameter estimates and is not able to output possible reconciliations, as mentioned in our Material and Methods section. We think this issue is mostly due to the fact that there is no pattern of codiversification: the coronavirus and mammal trees are so distinct that finding a reconciliation scenario between these trees with time-consistent switches is very difficult and ALE fails at estimating an amalgamated likelihood for such an unlikely scenario. We now ran the dated version of ALE independently on the smaller alpha and betacoronaviruses datasets. It still fails on the betacoronaviruses dataset.  On the alphacoronaviruses dataset, it does output significant reconciliations, however these reconciliations have a majority of events of transfers and losses, confirming that codiversification is unlikely in this clade.

      Reviewer #3 (Public Review):

      Summary:

      This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that crossspecies transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:

      The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:

      I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

      We totally agree that sampling biases in the virome of mammals is a prominent issue, which is why we conducted a series of sensitivity analyses to test their effect on our main conclusions. We thoroughly tested the effect of (i) the unequal sampling effort across mammalian species that have been screened and (ii) the unequal screening of mammalian species across the mammalian tree of life by subsampling the data to correct for the unequal sampling effort (see Supporting Information Text). In both cases, we still reported low support for a scenario of codiversification, the origin in bats in East Asia, the preferential host switches within mammalian orders, and the rare spillovers from bats to humans. The robustness of our findings to sampling biases may be explained by the fact that the cophylogenetic approach we used (ALE) explicitly accounts for undersampling by assuming that all host switches involve unsampled intermediate hosts. To address the reviewer's comment, we now better underline the importance of sampling biases in our main text (see Discussion, lines 487-494) with supporting references (note that we did not find the Cohen et al. Nature Comm reference). We also better highlight our sensitivity analyses by moving them from the Supporting Information Text to the main text. 

      We agree that distinguishing between alpha and beta coronaviruses provides useful additional insights. We have run separate cophylogenetic analyses for these two sub-clades and now report the results of these additional analyses in the revised manuscript, and put them in context with the existing literature about the two sub-clades.

      We were not aware of the work of Geoghegan et al. (see 2017, PLOS Pathogens), thank you for providing this reference that is now cited. 

      Reviewer #1 (Recommendations For The Authors):

      (1) Overall I found this paper to be quite difficult to follow. The text needs clearer structure, which can be helped by writing in shorter paragraphs and adding section headings. For example, there are some very long paragraphs starting on L83, L176, L215, L511, and L598.

      We have now added section headings and divided these paragraphs into smaller ones.

      (2) It would be helpful to define some of the key terminology relating to the evolutionary interactions between the viruses and their hosts. Some of the terms that are typically used in the context include "coevolution", "cospeciation", "codivergence", and "codiversification". These have different meanings and need to be used carefully. The paper mostly deals with "codivergence" between coronaviruses and their host species.

      We now provide a list of definitions in Box S1. These definitions are as in our recent article clarifying the differences between these patterns/processes (Perez-Lamarque & Morlon 2024).

      Specific comments

      L83-L105: This paragraph can be written more concisely.

      We prefer to keep this paragraph like this as it contains key explanations that are necessary for understanding our approach and results.  

      Figure 1: The timescales of the trees are rather confusing. The different scales are indicated by the gray shading but this is easy to overlook. Maybe stretching or compressing the trees horizontally would help to emphasise the different timescales.

      Done.

      Figure 2: Note that the maximum clade credibility tree is a specific tree sampled from the posterior distribution - it is not a consensus tree. In the figure caption, the meaning of "location" is unclear.

      We have removed the word “consensus”, thank you for noting this. We have replaced “location” by “branching order”. 

      L461: How was the model chosen, and why were different models used in the BEAST and PhyloBayes analyses?

      We did our PhyloBayes analyses first and used the LG model following methodology outlined in previous studies using ALE (e.g. Groussin et al. 2017; Dorrell et al. 2021). Unfortunately, the LG model is not available in the default version of BEAST2 so we had to use a different model (the WAG model). We have now run BEAST2 with the LG model (thanks to the BEAST_CLASSIC package) and we obtained very similar results (see Figure below showing the BEAST consensus trees obtained with the WAG or LG models – they only slightly differ by the branching of the u7351 OTU). We have now added this information in the Methods section. 

      Author response image 1.

      L477: It is not clear to me how the PhyloBayes and BEAST analyses differ. Please expand the explanation of why PhyloBayes was used here.

      We have now clarified this (lines 594-597). 

      L568: Why not test explicitly for recombination?

      We did test for the occurrence of recombination using several approaches, including

      OpenRDP (https://github.com/PoonLab/OpenRDP), our own custom code, and Gubbins (Croucher et al. 2015). These tests were however inconclusive, indicating either the absence or presence of recombination, thus suggesting that the palmprint region is too short to infer anything about recombination. We thus do not exclude the possibility that recombination occurred, and test the robustness of our results to recombination by running our analyses on different sub-parts of the palmprint region. We have clarified this in our Material & Methods.

      L618: "DNA sequences" -> "RNA sequences"

      Done.

      The paper contains numerous minor grammatical errors and would benefit from careful proofreading and editing. Please check the use of plurals and apostrophes. Some of the errors are listed below:

      L49: "As several" -> "As with several"

      Done.

      L178: "reconciliates" -> "reconciles"?

      Done.

      L199: "extent" -> "extant"

      Done.

      L289: This sentence needs rephrasing to avoid a triple negative ("cannot ... reject ... not present")

      Done.

      L469: "temporary" -> "temporal"

      Done.

      L470: "neglectable" -> "negligible"

      Done.

      L577: "not only relying" -> "not relying only"

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The study is generally well-constructed and its results are convincing. However, considering the availability of a dated host tree, conducting a dated reconciliation analysis could be beneficial. Creating a smaller sub-dataset and performing a dated reconciliation analysis would likely be a valuable addition to the research.

      We have now run the dated version of ALE on both the alpha and betacoronaviruses subclades. ALE dated still does not output reconciliations on the betacoronaviruses dataset, but it does on the smaller alphacoronaviruses dataset. We found significant reconciliations, indicating that mammal-alphacoronavirus associations are not random with respect to phylogeny, but the reconciliations involved more host switch and loss events (38 switches + 29 losses) than cospeciation events (65), indicating cophylogenetic signal in the absence of phylogenetic congruence (Perez-Lamarque & Morlon 2024). We now present the results on lines 264-282.  

      Reviewer #3 (Recommendations For The Authors):

      I think the results are written in a very speculative way, with many sentence fragments that should really be part of the discussion.

      We have carefully checked our Results section and rephrased or removed formulation that may have been perceived as speculative.  

      There are a lot of considerations in this manuscript about spread and future pandemics, but I think this is very far from the topic of this paper. When we quantified the coevolutionary risk of bats-betacovs in a recent paper (Forero et al. 2024, Virus Evol.), we only briefly touched upon this discussion because we compared our outputs with a measure of human population density. I don't think the manuscript needs to talk about epidemiology at all, and it would probably be more useful as a purely evo-bio piece.

      We think that it is useful to discuss the potential implications of our results for future pandemics, even though we agree that this discussion is rather speculative. We have removed the mention of predictions in the Abstract and have softened our wording in the Discussion.  

      References:

      Croucher, N.J., Page, A.J., Connor, T.R., Delaney, A.J., Keane, J.A., Bentley, S.D., et al. (2015). Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res., 43, e15.

      Dorrell, R.G., Villain, A., Perez-Lamarque, B., Audren de Kerdrel, G., McCallum, G., Watson, A.K., et al. (2021). Phylogenomic fingerprinting of tempo and functions of horizontal gene transfer within ochrophytes. Proc. Natl. Acad. Sci., 118, e2009974118.

      Edgar, R.C. et al. (2022). Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147.

      Groussin, M., Mazel, F., Sanders, J.G., Smillie, C.S., Lavergne, S., Thuiller, W., et al. (2017).

      Unraveling the processes shaping mammalian gut microbiomes over evolutionary time. Nat. Commun., 8, 14319.

      Perez-Lamarque, B. & Morlon, H. (2024). Distinguishing cophylogenetic signal from phylogenetic congruence clarifies the interplay between evolutionary history and species interactions. Syst. Biol.

    1. Author response:

      Reviewer 1 (Public Review):

      “Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.”

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest entrance using views within a confined area. While many studies have focused on larger scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing on a smaller scale, especially in dense environments.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      We agree with your comment about the term "clutter". Therefore, we will refer to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:

      line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views." Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the clutter but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing. (Neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we will include model results with the arena wall in the revised paper.

      As we wanted to investigate if bees would use ground views or bird’s eye views to home in a dense environment, we think the catchment volumes would provide qualitatively similar, though quantitatively more detailed information as catchment slices. Our approach of catchment slices is sufficient to predict whether ground or bird' s-eye views perform better in leading to the nest, and we will, therefore, not include further computations of catchment volumes.

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments. A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Our current knowledge of learning flights did not permit these investigations of bee training. Firstly, our setup does not allow us to record each inbound and outbound flight of the bumblebees during training. Doing so would require blocking the entire colony for extended time periods, potentially impairing the motivation of the bees to forage or the survival and development of the colony. Secondly, the exact locations where bees learn or if and whether they continuously learn by weighting the visual experience based on their positions and orientations is not always clear. It makes it difficult to categorise these flights accurately in learning and return flights. Additionally, homing models remain elusive on the learning mechanisms at play during the learning flights. Therefore, we believe that continuous effort must be made to understand bees' learning and homing ability. We felt it was necessary first to establish that bees could navigate back to the nest in a dense, cluttered environment. With this understanding, we are currently conducting a detailed study of the bees' learning flights in various dense environments and provide these results in a separate article.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the clutter.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled lab conditions. Both field and lab research are absolutely necessary and should feed each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of these components for the behaviour through targeted variation of individual components of the environment. These results should guide field-based experiments for validation.

      Our lab settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and will refer to our environment as a "dense environment."

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factor inherent to field work, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious

      mechanisms for homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Editors’ recommendations for the authors

      The reviewers recommend the following: 

      (a) Digging deeper into the discussion of the density-dependent dispersal. 

      (b) Clarifying the microfluidic setup.  

      (c) Clarifying the description and interpretation of the transcriptomic evidence. 

      (d) Toning down carbon cycle connections (some reviewers felt the evidence did not fully support the claims). 

      We would like to thank the editors for their thoughtful evaluation of our manuscript and their clear suggestions. We have revised the manuscript in the light of these comments, as we outline below and address in detail in the point-by-point response to the reviewers’ comments that follows. 

      (a) We have expanded the discussion of density-dependent dispersal and revised Figure 2C to improve clarity. 

      (b) We have also added further information concerning the microfluidic setup in the results section and provide an illustration of the setup in a new figure panel, Figure 1A.

      (c) Addressing the reviewers’ comments on the transcriptomic analysis, we have added more information in the description and interpretation of the results. 

      (d) We have rephrased the text describing the role of degradation-dispersal cycles for carbon cycling to highlight it as the motivation of this study and emphasize the link to literature on foraging, without creating expectations of direct measurements of global carbon cycling.

      Public Reviews:

      Reviewer #1 (Public Review):

      [...]

      Weaknesses: 

      Much of the genetic analysis, as it stands, is quite speculative and descriptive. I found myself confused about many of the genes (e.g., quorum sensing) that pop up enriched during dispersal quite in contrast to my expectations. While the authors do mention some of this in the text as worth following up on, I think the analysis as it stands adds little insight into the behaviors studied. However, I acknowledge that it might have the potential to generate hypotheses and thus aid future studies. Further, I found the connections to the carbon cycle and marine environments in the abstract weak --- the microfluidics setup by the authors is nice, but it provides limited insight into naturalistic environments where the spatial distribution and dimensionality of resources are expected to be qualitatively different. 

      We thank the reviewer for their suggestions to improve our manuscript. We agree that the original manuscript would have benefitted from more detailed interpretation of the observed changes in gene expression. We have revised the manuscript to elaborate on the interpretation of the changes in expression of quorum sensing genes (see response to reviewer 1, comment 3), motility genes (see response to reviewer 1, comment 6), alginate lyase genes (see response to reviewer 1, comment 7 and reviewer 2, comment 2), and ribosomal and transporter genes (see response to reviewer 2, comment 2).

      In general, we think that the gene expression study not only supports the phenotypic observations that we made in the microfluidic device, such as the increased swimming motility when exposed to digested alginate medium, but  also adds further insights. Our reasoning for studying the transcriptomes in well mixed-batch cultures was the inability to study gene expression dynamics to support the phenotypic observations about differential motility and chemotaxis in our microfluidics setup. The transcriptomic data clearly show that even in well-mixed environments, growth on digested alginate instead of alginate is sufficient to increase the expression of motility and chemotaxis genes. In addition, the finding that expression of alginate lyases and metabolic genes is increased during growth on digested alginate was revealed through the analysis of transcriptomes, something which would not have been possible in the microfluidic setup. We agree with the reviewer that our analyses implicate further, perhaps unexpected, mechanisms like quorum sensing in the cellular response to breakdown products, and that this represents an interesting avenue for further studies.

      Finally, we  also agree with the reviewer that it would be good to be more explicit in the text that our microfluidic system cannot fully capture the complex dynamics of natural environments. Our approach does, however, allow the characterization of cellular behaviors at spatial and temporal scales that are relevant to the interactions of bacteria, and thus provides a better understanding of colonization and dispersal of marine bacteria in a manner that is not possible through in situ experiments. We have edited our manuscript to highlight this and modified our statements regarding carbon cycling towards emphasizing the role degradation-dispersal cycles in remineralization of polysaccharides (see response to reviewer 1, comment 2).  

      Reviewer #2 (Public Review):

      [...]

      Weaknesses: 

      The explanation of the microfluidics measurements is somewhat confusing but I think this could be easily remedied. The quantitative interpretation of the dispersal data could also be improved and I'm not clear if the data support the claim made. 

      We thank the reviewer for their comments and helpful suggestions. We have revised the manuscript with these suggestions in mind and believe that the manuscript is improved by a more detailed explanation of the microfluidic setup. We have added more information in the text (detailed in response to reviewer 2, comments 1 and 2) and have added a depiction of the microfluidic setup (Fig. 1A). We have also modified the presentation and discussion of the dispersal data (Fig. 2C), as described in detail below in response to reviewer 2, comment 4, and argue that they clearly show density-dependent dispersal. We believe that this modification of how the results are presented provides a more convincing case for our main conclusion, namely that the presence of degradation products controls bacterial dispersal in a density-dependent manner.  

      Reviewer #3 (Public Review):

      [...]

      Weaknesses: 

      I find this paper very descriptive and speculative. The results of the genetic analyses are quite counterintuitive; therefore, I understand the difficulty of connecting them to the observations coming from experiments in the microfluidic device. However, they could be better placed in the literature of foraging - dispersal cycles, beyond bacteria. In addition, the interpretation of the results is sometimes confusing. 

      We thank the reviewer for their suggestions to improve the manuscript. We have edited the manuscript to interpret the results of this study more clearly, in particular with regard to the fact that breakdown products of alginate cause cell dispersal (see response to reviewer 2, comment 1), gene expression changes of ribosomal proteins and transporters (see response to reviewer 2, comment 2), as well as genes relating to alginate catabolism (see response to reviewer 2, comment 3).

      To provide more context for the interpretation of our results we now also embed our findings in more detail in the previous work on foraging strategies and dispersal tradeoffs.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should clarify in more detail what they mean by density dependence in Figure 2. Usually density dependence refers to a per capita dependence, but here it seems that the per capita rate of dispersal might be roughly independent of density (Figure 2c; if you double the number of cells it doubles the number of cells leaving). Rather it seems the dispersal is such that the density of remaining cells falls below a threshold (~300 cells). 

      We thank the reviewer for raising this important point. To analyze the data more explicitly in terms of per capita dependence and so make the density dependence in the dispersal from the microfluidic chambers more clear, we have modified Figure 2C and edited the text. 

      In the modified Figure 2C, we computed the fraction of dispersed cells for each chamber (i.e the change in cell number divided by the cell number at the time of the nutrient switch). This quantity directly reveals the per-capita dependence, as mentioned by reviewer 1, and is now represented on the y-axis of Figure 2C instead of the absolute change in cell number. 

      These data demonstrate that the fraction of dispersed cells increases with increasing numbers of cells present in the chamber at the time of switching, with more highly populated chambers showing a higher fraction of dispersed cells. These findings indicate that there is a strong density dependence in the dispersal process.

      As pointed out by reviewer 1, another interesting aspect of the data is the transition at low cell number. The fraction of dispersed cells is negative in the case of the chamber with approximately 70 cells, consistent with no dispersal at this low density, and a moderate density increase as a function of continued growth.  

      In addition to the new analysis presented in Figure 2C, we have modified the paragraph that discusses this result as follows (line 208):

      “We indeed found that the nutrient switch caused a few or no cells to disperse from small cell groups (Fig. 2B), whereas a large fraction of cells from large cell groups dispersed (Fig. 2C). In fact, the e fraction of cells that dispersed upon imposition of the nutrient switch showed a strong positive relationship with the number of cells present, meaning that cells in chambers with many cells were more likely to disperse than cells in chambers with fewer cells (Fig. 2C).”

      (2) The authors should tone down their claims about the carbon cycle in the abstract. I do not believe the results as they stand could be used to understand degradation-dispersal cycles in marine environments relevant to the carbon cycle, since these behaviors have been studied in microfluidic environments which in my understanding are quite different. As such, statements such as "degradation-dispersal cycles are an integral part in the global carbon cycle, we know little about how cells alternate between degradation and motility" and "Overall, our findings reveal the cellular mechanisms underlying bacterial degradation-dispersal cycles that drive remineralization in natural environments" are overstated in the abstract. 

      We appreciate the reviewer’s comments regarding the connections of our work with the carbon cycle. We have now rephrased these statements in our manuscript to describe a potential connection between our work and the marine carbon cycle. The colonization of polysaccharides particles by bacteria and subsequent degradation has been widely acknowledged to play a significant role in controlling the carbon flow in marine ecosystems. (Fenchel, 2002; Preheim et al., 2011; Yawata et al., 2014, 2020). We still refer to carbon flow in the revised manuscript, though cautiously, as microbial remineralization of biomass, which is recognized as an important factor in the marine biological carbon pump (e.g., (Chisholm, 2000; Jiao et al., 2024). As stated in the previous version of the manuscript, the main motivation of our work was to study the growth behaviors of marine heterotrophic bacteria during polysaccharide degradation, especially to understand when bacteria depart already colonized and degraded particles and find novel patches to grow and degrade, a process that is poorly understood. Therefore, it is conceivable that degradation-dispersal cycles do play a role in the flow of carbon in marine ecosystems. However, we acknowledge that the carbon cycle is influenced by a multitude of biological and chemical processes, and the bacterial degradation-dispersal cycle might not be the sole mechanism at play. 

      We also appreciate the reviewer’s comments highlighting that the complexity of natural environments is not fully captured in our microfluidics system. However, our microfluidics setup does allow us to quantify responses and behaviors of microbial groups at high spatial and temporal resolution, especially in the context of environmental fluctuations. Microbes in nature interact at small spatial scales and have to respond to changes in the environment, and the microfluidics setup enables the quantification of these responses. Moreover, dispersal of the bacterium V. cyclitrophicus that we use in our study, has been previously observed even during growth on particulate alginate (Alcolombri et al., 2021), but the cues and regulation controlling dispersal behaviors have been unclear.  Microfluidic experiments have now allowed us to study this process in a highly quantitative manner, and align well with observations from experiments from more nature-like settings. These quantitative experiments on bacterial strains isolated from marine particles are expected to constrain quantitative models of carbon degradation in the ocean (Nguyen et al., 2022).

      We have now adjusted our statements throughout our manuscript to reflect the knowledge gaps in understanding the triggers of degradation-dispersal cycles and their links with carbon flow in marine ecosystems. The revised manuscript, especially, contains the following statements (line 47 and line 60):

      “Even though many studies indicate that these degradation-dispersal cycles contribute to the carbon flow in marine systems, we know little about how cells alternate between polysaccharide degradation and motility, and which environmental factors trigger this behavioral switch.”

      “Overall, our findings reveal cellular mechanisms that might also underlie bacterial degradation-dispersal cycles, which influence the remineralization of biomass in marine environments.”

      (3) The authors should clarify why they think quorum-sensing genes are increased in expression on digested alginate. The authors currently mention that QS could be used to trigger dispersal, but given the timescales of dispersal in Figure 2 (~half an hour), I find it hard to believe that these genes are expressed and have the suggested effect on those timescales. As such I would have expected the other way round - for QS genes to be expressed highly during alginate growth, so that density could be sensed and responded to. Please clarify. 

      We have now clarified this point in the revised manuscript. While the triggering of dispersal by quorum-sensing genes may indeed appear counterintuitive, and the response is rapid (we see dispersal of cells within 30-40 minutes), both observations are in line with previous studies in another model organism Vibrio cholerae. The dispersal time is similar to the dispersal time of V. cholerae cells from biofilms, as described by Singh and colleagues, (Figure 1E of Ref. Singh et al., 2017). In that case, induction of the quorum sensing dispersal regulator HapR was observed during biofilm dispersal within one hour after switch of condition (Fig. 2, middle panel of Ref. Singh et al., 2017). Even though the specific quorum sensing signaling molecules are probably different in our strain (there is no annotated homolog of the hapR gene in V. cyclitrophicus), we observed that the full set of quorum sensing genes was enriched in cells growing on digested alginate (as reported in line 314 and Fig. 4A).

      We have added this information in the manuscript (line 317): 

      “The set of quorum sensing genes was also positively enriched in cells growing on digested alginate (Fig. 4A and S4F, Table S13). This role in dispersal is in agreement with a previous study that showed induction of the quorum sensing master regulator in V. cholerae cells during dispersal from biofilms on a similar time scale as here (less than an hour) [28].”

      Reviewer #2 (Recommendations For The Authors):

      (1) Around line 144 - I don't really understand how you flow alginate through the microfluidic platform. It seems if the particles are transiently going through the microfluidic chamber then the flow rate and hence residence time of the alginate particles will matter a lot by controlling the time the cells have to colonize and excrete enzymes for alginate breakdown. Or perhaps the alginate is not particulate but is instead a large but soluble polymer? I think maybe a schematic of the microfluidic device would help -- there is an implicit assumption that we are familiar with the Dal Co et al device, but I don't recall its details and maybe a graphic added to Figure 1 would help. 

      a. In reviewing the Dal Co paper I see that cells are trapped and the medium flows through channels and the plane where the cells are held. I am still a little confused about the size of the polymeric alginate -- large scale (>1um) particles or very small polymers? 

      We have now provided a detailed description of our microfluidic experimental system. At the start of the experiments, cells are in fact not trapped within the microfluidic device, but grow and can move freely within a chamber designed with dimensions (sub-micron heights) so that growth occurs only as a monolayer. Cells were exposed to nutrients, either alginate or alginate digestion products, both in soluble form (not particles). These compounds were flowed into the device through a main channel, but entered the flowfree growth chambers by diffusion. To make these aspects of our experiments clearer, we have added further information on this in the Materials & Methods section (line 556), added this information in the abstract (line 51), and in the results (line123).

      To make our microfluidic setup clearer, we have followed this advice and added a schematic as Figure 1A and have added more information on the setup to the main text (line 153):

      “In brief, the microfluidic chips are made of an inert polymer (polydimethylsiloxane) bound to a glass coverslip. The PDMS layer contains flow channels through which the culture medium is pumped continuously. Each channel is connected to several growth chambers that are laterally positioned. The dimensions of these growth chambers (height: 0.85 µm, length: 60 µm, width: 90-120 µm) allow cells to freely move and grow as monolayers. The culture medium, containing either alginate or digested alginate in their soluble form, is constantly pumped through the flow channel and enters the growth chambers primarily through diffusion [15,16,4,17,8]. Therefore, the number of cells and their positioning within microfluidic chambers is determined by the cellular growth rate as well as by cell movement4. This setup combined with time-lapse microscopy allowed us to follow the development of cell communities over time.”

      (2) What makes this confusing is the difference between Figure 1C and Figure S2A -- the authors state that the difference in Figure 1C is due to dispersal, but is there flow through the microfluidic device? So what role does that flow through the device have in dispersal? Is the adhesion of the cell groups driven at all by a physical interaction with high molecular weight polymers in the microfluidic devices or is this purely a biological effect? Could this also be explained by different real concentrations of nutrients in the two cases? 

      We realize from this comment that the role of flow of the medium in the microfluidic setup was not clearly addressed in our manuscript. In fact, cells were not exposed to flow, and nutrients were provided to the growth chambers by diffusion. We have added a clearer explanation of this point on line 158:

      “The culture medium, containing either alginate or digested alginate in their soluble form, is constantly pumped through the flow channel and enters the growth chambers primarily through diffusion [15,16,4,17,8]. Therefore, the number of cells and their positioning within microfluidic chambers is determined by the cellular growth rate as well as by cell movement4.“

      One purely physical effect that we anticipate is that a high viscosity of the medium could immobilize cells. To address this point, we measured the viscosity of both alginate and digested alginate and conclude that the increase in viscosity is not strong enough to immobilize cells. We added a statement in the text (line 170)

      “To test the role of increased viscosity of polymeric alginate in causing the increased aggregation of cells, we measured the viscosity of 0.1% (w/v) alginate or digested alginate dissolved in TR media. For alginate, the viscosity was 1.03±0.01 mPa·s (mean and standard deviation of three technical replicates) whereas the viscosity of digested alginate in TR media was found to be 0.74±0.01 mPa·s. Both these values are relatively close to the viscosity of water at this temperature (0.89 mPa·s18) and, while they may affect swimming behavior [19], they are insufficient to physically restrain cell movement [20].”

      as well as a section in the Materials and Methods (line 594):

      “Viscosity of the alginate and digested alginate solution

      We measured the viscosity of alginate solutions using shear rheology measurements. We use a 40 mm cone-plate geometry (4° cone) in a Netzsch Kinexus Pro+ rheometer. 1200 uL of sample was placed on the bottom plate, the gap was set at 150 um and the sample trimmed. We used a solvent trap to avoid sample evaporation during measurement. The temperature was set to 25°C using a Peltier element. We measure the dynamic viscosity over a range of shear rates  = 0.1 – 100 s-1. We report the viscosity of each solution as the average viscosity measured over the shear rates 10 – 100 s-1, where the shear-dependence of the viscosity was low.

      We measured the viscosity of 0.1% (w/V) alginate dissolved in TR media, which was 1.03 +/- 0.01 mPa·s (reporting the mean and standard deviation of three technical replicates.). The viscosity of 0.1% digested alginate in TR media was found to be 0.74+/-0.01 mPa·s. This means that the viscosity of alginate in our microfluidic experiments is 36% higher than of digested alginate, but the viscosities are close to those expected of water (0.89 mPa·s at 25 degree Celsius according to Berstad and colleagues [18]).”

      While our microfluidic setup allows us to track the position and movement of cells in a spatially structured setting, these observations do not allow us to distinguish directly whether the differences in dispersal are a result of purely physical effects of polymers on cells or are a result of them triggering a biological response in cells that causes them to become sessile. It is known that bacterial appendages like pili interact with polysaccharide residues (Li et al., 2003). Therefore, it is quite plausible that cross-linking by polysaccharides can contribute growth behaviors on alginate. However, our analysis of gene expression demonstrates that flagellum-driven motility is decreased in the presence of alginate compared to digested alginate, alongside other major changes in gene expression. In addition, our measures of dispersal show that dispersal of cells when exposed to digested alginate is density dependent. Both observations suggest that the patterns in dispersal are governed by decision-making processes by cells resulting in changes in cell motility, rather than being a product of purely physical interactions with the polymer. 

      The finding that viscosities of both alginate and digested alginate are similar to that of water, suggests that diffusion of nutrients in the growth chambers should be similar. Therefore, we think that the differences in real concentrations of nutrients is likely not contributing to the observed differences in behavior. 

      (3) Why is Figure S1 arbitrary units? Does this have to do with the calibration of LC-MS? It would be better, it seems, to know the concentrations in real units of the monomer at least. 

      We agree with the reviewer that it would have been better to have absolute concentrations for these compounds. However, to calibrate the mass spectrometer signals (ion counts) to absolute concentrations for the different alginate compounds, we would need an analytical standard of known concentration. We are not aware of such a standard and thus report only relative concentrations. We agree that the y-axis label of Figure S1 should not contain ‘arbitrary’ units, as it shows a ratio (of measurements in the same arbitrary units). We have edited the labels of Figure S1 accordingly and the figure legend in line 26 of the Supplemental Material (“Relative concentrations…”).

      (4) Line 188 - density-dependent dispersal. The claim here is that "cells in chambers with many cells were more likely to disperse than cells in chambers with less cells." (my emphasis). Looking at the data in Figure 2C it appears that about 40% of the cells disperse irrespective of the density, before the switch to digested alginate. So it would seem that there is not a higher likelihood of dispersal at higher cell densities. For the very highest cell density, it does appear that this fraction is larger, but I'd be concerned about making this claim from what I understand to be a single experiment. To support the claim made should the authors plot Change in Cell number/Starting Cell number on the y-axis of Fig. 2C to show that the fraction is increasing? It would seem some additional data at higher starting cell densities would help support this claim more strongly. 

      We thank the reviewer for this comment, which is in line with a remark made by reviewer 1 in their comment 1. In response to these two comments (and as described above), we have edited Figure 2C and now have plotted the change in cell number relative to starting cell number at the y axis to directly show the density dependence. We observe a positive (approximately linear) relationship between the fraction of dispersed cells with the number of cells present in the chamber at the time of switching. This indicates that there is a density dependence in the dispersal process, with highly populated chambers showing a higher fraction of dispersed cells. 

      In addition to the change in Figure 2C, we have modified the paragraph around line 208: “We indeed found that the nutrient switch caused a few or no cells to disperse from small cell groups (Fig. 2B), whereas a large fraction of cells from large cell groups dispersed (Fig. 2C). In fact, the e fraction of cells that dispersed upon imposition of the nutrient switch showed a strong positive relationship with the number of cells present, meaning that cells in chambers with many cells were more likely to disperse than cells in chambers with fewer cells (Fig. 2C).”

      The highest cell number at the start of the switch that we include is about 800 cells. The maximum number of cells that can fit into a chamber are ca. 1000 cells. Thus, 800 resident cells are close to the maximal density.

      (5) A comment -- I find the result of significant chemotaxis towards alginate but not the monomers of alginate to be quite surprising. The ecological relevance of this (line 219) seems like an important result that is worth expanding on a bit at least in the discussion. For now, my question is whether the authors know of any mechanism by which chemotaxis receptors could respond to alginate but not the monomer. How can a receptor distinguish between the two? 

      We agree that this result is surprising, given that oligomers can be more easily transported into the periplasm where sensing takes place, and they also provide an easier accessible nutrient source. Indeed, in case of the insoluble polymer chitin it has been shown that chemotaxis towards chitin is mediated by chitin oligomers (Bassler et al., 1991), which was suggested as a general motif to locate polysaccharide nutrient sources (Keegstra et al., 2022). However, a recent study has changed this perspective by showing widespread chemotaxis of marine bacteria towards the glucose-based marine polysaccharide laminarin, but not towards laminarin oligomers or glucose (Clerc et al., 2023). Together with our results on chemotaxis towards alginate (but not significantly toward alginate oligomers) this suggests that chemotaxis towards soluble polysaccharides can be mediated by direct sensing of the polysaccharide molecules.

      As recommended, we expanded the discussion of the ecological relevance and also added more information on possible mechanisms of selective sensing of alginate and its breakdown products (around line 479).:

      “Direct chemotaxis towards polysaccharides may facilitate the search for new polysaccharide sources after dispersal. We found that the presence of degradation products not only induces cell dispersal but also increases the expression of chemotaxis genes. Interestingly, we found that V. cyclitrophicus ZF270 cells show chemotaxis towards polymeric alginate but not digested alginate. This contrasts with previous findings for bacterial strains degrading the insoluble marine polysaccharide chitin, where chemotaxis was strongest towards chitin oligomers53, suggesting that oligomers may act as an environmental cue for polysaccharide nutrient sources55. However, recent work has shown that certain marine bacteria are attracted to the marine polysaccharide laminarin, and not laminarin oligomers56. Together with our results, this indicates that chemotaxis towards soluble polysaccharides may be mediated by the polysaccharide molecules themselves. The mechanism of this behavior is yet to be identified, but could be mediated by polysaccharide-binding proteins as have been found in Sphingomonas sp. A1 facilitating chemotaxis towards pectin57. Direct polysaccharide sensing adds complexity to chemosensing as polysaccharides cannot freely diffuse into the periplasm, which can lead to a trade-off between chemosensing and uptake58. Furthermore, most polysaccharides are not immediately metabolically accessible as they require degradation. But direct polysaccharide sensing can also provide certain benefits compared to using oligomers as sensory cues. First, it could enable bacterial strains to preferably navigate to polysaccharide nutrients sources that are relatively uncolonized and hence show little degradation activity. Second, strong chemotaxis towards degradation products could hinder a timely dispersal process as the dispersal then requires cells to travel against a strong attractant gradient formed by the degradation products. Overall, this strategy allows cells to alternate between degradation and dispersal to acquire carbon and energy in a heterogeneous world with nutrient hotspots [44,59–61].”

      (6) Comment on lines 287-8 -- that the "positive enrichment of the gene set containing bacterial motility proteins matched the increase in motile cells that we observe in Fig 3E." I'm confused about what is meant by the word "matched" here. Is the implication that there is some quantitative correspondence between increased motility in Figure 3 and the change in expression in Figure 4? Or is the statement a qualitative one -- that motility genes are upregulated in the presence of digested alginate? Table S12 didn't help me answer this question. 

      We thank the reviewer for their helpful comment. Our original statement was a qualitative one - observing that gene expression enrichment in genes associated with bacterial motility aligned with our expectations based on the previous observation of an increase in motile cells. We have now changed the wording to highlight the qualitative nature of this statement (line 315):

      “The positive enrichment of the gene set containing bacterial motility proteins aligned with our expectations based on the increase in motile cells that we observed in Figure 3E (Fig. 4A, Table S12).”

      (7) Line 326 - what is the explanation for the production of public enzymes in the presence of digest? How does this square with the previous narrative about cells growing on alginate digest expressing motility genes and chemotaxing towards alginate? It seems like the story is a bit tenuous here in the sense that digested alginates stimulate both motility - which is hypothesized to drive the discovery of new alginate particles - and lyase enzymes which are used to degrade alginate. So do the high motility cells that are chemotaxing towards alginate also express lyases en route? I'm of the opinion that constructing narratives like these in the absence of a more quantitative understanding of the colonization and degradation dynamics of alginate particles presents a major challenge and may be asking more of the data than the data can provide. 

      a. I noted later that this is addressed later around lines 393 in the Discussion section.

      Indeed, the notion that the presence of breakdown products triggers motility and also increases the expression of alginate lyases and other metabolic genes for alginate catabolism seems counterintuitive. We have now expanded our discussion of these results to contextualize these findings (around line 443):

      "One reason for this observation may be that cells primarily rely on intracellular monosaccharide levels to trigger the upregulation of genes associated with polysaccharide degradation and catabolism, as has previously been observed for E. coli across various carbon sources [50,51]. In fact, the majority of carbon sources are sensed by prokaryotes through one‑component sensors inside the cell50. In the one‑component internal sensing scheme, the enzymes and transporters for the use of various carbon sources are expressed at basal levels, which leads to an increase in pathway intermediates upon nutrient availability. The pathway intermediates are sensed by an internal sensor, usually a transcription factor, and lead to the upregulation of transporter and enzyme expression [50,51]. This results in a positive feedback loop, which enables small changes in substrate abundance to trigger large transcriptional responses [50,52]. Thus, the presence of alginate breakdown products may likely result in increased expression of all components of the alginate degradation pathway, including the expression of degrading enzymes. As the gene expression analysis was performed on well-mixed cultures in culture medium containing alginate breakdown products, we therefore expect a strong stimulation of alginate catabolism. In a natural scenario, where cells disperse from a polysaccharide hotspot before its exhaustion, the expression of alginate catabolism genes may likely decrease again once the local concentration of breakdown products decreases. However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients."

      (8) I like Figure 6, and I think this hypothesis is a good result from this paper, but I think it would be important to emphasize this as a proposal that needs further quantitative analysis to be supported. 

      We have now edited the manuscript to make this point more clear. While both degradation and dispersal are well-appreciated parts of microbial ecology, the transitions and underlying mechanisms are unclear. We have edited the discussion to improve the clarity (line 419): 

      “This cycle of biomass degradation and dispersal has long been discussed in the context of foraging e.g., [44,45,13,46,47], but the cellular mechanisms that drive the cell dispersal remain unclear.”

      Also, we have updated Figure 6 to indicate more clearly which new findings this work proposes (now bold font) and which previous findings that were made in different bacterial taxa and carbon sources that aligns with our  work (now light font). We edited the figure legend accordingly (line 503):

      "By integrating our results with previous studies on cooperative growth on the same system, as well as results on dispersal cycles in other systems, we highlight where the specific results of this work add to this framework (bold font)."

      Minor comments 

      (1) Is there any growth on the enzyme used for alginate digestion? E.g. is the enzyme used to digest the alginate at sufficiently high concentrations that cells could utilize it for a carbon/nitrogen source? 

      We thank the reviewer for raising this point. We added the following paragraph as Supplemental Text to address it (line 179):

      “Protein amount of the alginate lyases added to create digested alginate

      Based on the following calculation, we conclude that the amount of protein added to the growth medium by the addition of alginate lyases is so small that we consider it negligible. In our experiment we used 1 unit/ml of alginate lyases in a 4.5 ml solution to digest the alginate. As the commercially purchased alginate lyases are 10,000 units/g, our 4.5 ml solution contains 0.45 mg of alginate lyase protein. The digested alginate solution diluted 45x when added to culture medium. This means that we added 0.18 µg alginate lyase protein to 1 ml of culture medium.

      As a comparison, for 1ml of alginate medium, 1000µg of alginate is added or for 1 ml of Lysogeny broth (LB) culture medium, 3,500 µg of LB are added.  Thus, the amount of alginate lyase protein that we added is ca. 5000 - 20,000 times smaller than the amount of alginate or LB that one would add to support cell growth. Therefore, we expect the growth that the digestion of the added alginate lyases would allow to be negligible.”

      (2) The lines in Figure 2B are very hard to see. 

      We have addressed this comment by using thicker lines in Figure 2B.

      (3) The black background and images in Figure 3A and B are hard to see as well. 

      We have now replaced Figure 3A and B, now using a white background.

      (4) Typo at the beginning of line 251? 

      Unfortunately we failed to find the typo referred to. We are happy to address it if it still exists in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) I think there is not enough experimental evidence to conclude that the underlying cause of increased motility is the accumulation of digested alginate products. To conclusively show that this is the cause and not just some signal linked to cell density, perhaps the experiment should be repeated with a different carbon source. 

      We thank the reviewer for their comment, which made us realize that we did not make the nature of the dispersal cue clear. The gene expression data was obtained from batch cultures and measured at the same approximate bacterial densities in batch, which indeed shows that the digested alginate is a sufficient signal for an increase in motility gene expression. This agrees very well with our observation that cells growing on digested alginate in microfluidic chambers have an increased fraction of motile cells in comparison with cells exposed to alginate (Fig 3E). However, we did not mean to suggest that the observed dispersal by bacterial motility is not influenced by cell density, in fact, we see that dispersal (and hence the increase in cell motility) in microfluidic chambers that are switched from polymeric to digested alginate depends on the bacterial density in the chamber, with higher bacterial densities showing increased dispersal. This shows that the presence of alginate oligomers does trigger dispersal through motility, but this signal affects bacterial groups in a cell density dependent manner.

      Similar observations have been made in Caulobacter crescentus, which was found to form cell groups on the polymer xylan while cells disperse when the corresponding monomer xylose becomes available (D’Souza et al., 2021). We reference the additional work in lines 179 and 230. Taken together, these observations indicate a more general phenomenon in dispersal from polysaccharide substrates.

      (2) About the expression data: 

      • Ribosomal proteins and ABC transporters are enriched in cells grown on digested alginate and the authors discuss that this explains the difference in max growth rate between alginate and digested alginate. However, in Figure S2E the authors report no statistical difference between growth rates. 

      We have now edited the manuscript to clarify this point. We found that cells grown on degradation products reached their maximal growth rate around 7.5 hours earlier (Fig. S2D) and showed increased expression of ribosomal biosynthesis and ABC transporters in late-exponential phase (Fig. 4A). We consider this shorter lag time as a sign of a different growth state and therefore a possible reason for the difference in ribosomal protein expression.

      As the reviewer correctly points out, the maximum growth rates that were computed from the two growth curves were not significantly different (Fig. S2E). However, for our gene expression analysis, we harvested the transcriptome of cells that reached OD 0.39-0.41 (mid- to late-exponential phase). At this time point, the cell cultures may have differed in their momentary growth rate.

      We edited the manuscript to make this clearer (line 287):

      “Both observations likely relate to the different growth dynamics of V. cyclitrophicus ZF270 on digested alginate compared to alginate (Fig. S2A), where cells in digested alginate medium reached their maximal growth rate 7.5 hours earlier and thus showed a shorter lag time (Fig. S2D). As a consequence, the growth rate at the time of RNA extraction (mid-to-late exponential phase) may have differed, even though the maximum growth rate of cells grown in alginate medium and digested alginate medium were not found to be significantly different (Fig. S2E).”

      • The increased expression of transporters for lyases in cells grown on digested alginate (lines 273-274 and 325-328) is very confusing and the explanation provided in lines 412-420 is not very convincing. My two cents on this: Expression of more enzymes and induction of motility might be a strategy to be prepared for more likely future environments (after dispersal, alginate is the most likely carbon source they will find). This would be in line with observed increased chemotaxis towards the polymer rather than the monomer (Similar to C. elegans). 

      This comment is in line with reviewer 2, comment 7. In response to these two comments (and as described above), we expanded our discussion of these results to contextualize these findings (around line 443):

      “One reason for this observation may be that cells primarily rely on intracellular monosaccharide levels to trigger the upregulation of genes associated with polysaccharide degradation and catabolism, as has previously been observed for E. coli across various carbon sources [50,51]. In fact, the majority of carbon sources are sensed by prokaryotes through one‑component sensors inside the cell [50]. In the one‑component internal sensing scheme, the enzymes and transporters for the use of various carbon sources are expressed at basal levels, which leads to an increase in pathway intermediates upon nutrient availability. The pathway intermediates are sensed by an internal sensor, usually a transcription factor, and lead to the upregulation of transporter and enzyme expression [50,51]. This results in a positive feedback loop, which enables small changes in substrate abundance to trigger large transcriptional responses [50,52]. Thus, the presence of alginate breakdown products may likely result in increased expression of all components of the alginate degradation pathway, including the expression of degrading enzymes. As the gene expression analysis was performed on well-mixed cultures in culture medium containing alginate breakdown products, we therefore expect a strong stimulation of alginate catabolism. In a natural scenario, where cells disperse from a polysaccharide hotspot before its exhaustion, the expression of alginate catabolism genes may likely decrease again once the local concentration of breakdown products decreases. However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.”

      Additionally, we agree with the intriguing comment that continued expression of alginate lyases may also prepare cells for likely future environments. Further studies that aim to answer whether marine bacteria are primed by their growth on one carbon source towards faster re-initiation of degradation on a new particle will be an interesting research question. We now address this point in our manuscript (line 458):

      “However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.“

      (3) The yield reached by Vibrio on alginate is significantly higher than the yield in digested alginate, not similar, as stated in lines 133-134. Only cell counts are similar. Perhaps the author can correct this statement and speculate on the reason leading to this discrepancy: perhaps cells tend to aggregate in alginate despite the fact that these are well-mixed cultures. 

      We have edited the description of the OD measurements accordingly and agree with the reviewer that aggregation is indeed a possible reason for the discrepancy (line 141):

      “We also observed that the optical density at stationary phase was higher when cells were grown on alginate (Fig. S2B and C). However, colony counts did not show a significant difference in cell numbers (Fig. S3), suggesting that the increased optical density may stem from aggregation of cells in the alginate medium, as observed for other Vibrio species [7].”

      (4) I suggest toning down the importance of the results presented in this study for understanding global carbon cycling. There is a link but at present it is too much emphasized. 

      We have edited our statements regarding the carbon cycle. In the revised manuscript we stress the lack of direct quantifications of carbon cycling. . We still refer to carbon flow in the revised manuscript, as we would argue that microbial remineralization of biomass is recognized as an important factor in the marine biological carbon pump (e.g., Chisholm, 2000) and research on marine bacterial foraging investigates how bacterial cells manage to find and utilize this biomass.

      Our revised manuscript contains the following modified statements (line 47 and line 60): “Even though many studies indicate that these degradation-dispersal cycles contribute to the carbon flow in marine systems, we know little about how cells alternate between polysaccharide degradation and motility, and which environmental factors trigger this behavioral switch.”

      “Overall, our findings reveal cellular mechanisms that might also underlie bacterial degradation-dispersal cycles, which influence the remineralization of biomass in marine environments.”

      References

      • Alcolombri, U., Peaudecerf, F. J., Fernandez, V. I., Behrendt, L., Lee, K. S., & Stocker, R. (2021). Sinking enhances the degradation of organic particles by marine bacteria. Nature Geoscience, 14(10), 775–780. https://doi.org/10.1038/s41561-021-00817-x
      • Bassler, B. L., Gibbons, P. J., Yu, C., & Roseman, S. (1991). Chitin utilization by marine bacteria. Chemotaxis to chitin oligosaccharides by Vibrio furnissii. Journal of Biological Chemistry, 266(36), 24268–24275. https://doi.org/10.1016/S0021-9258(18)54224-1
      • Chisholm, S. W. (2000). Stirring times in the Southern Ocean. Nature, 407(6805), 685–686. https://doi.org/10.1038/35037696
      • Chubukov, V., Gerosa, L., Kochanowski, K., & Sauer, U. (2014). Coordination of microbial metabolism. Nature Reviews. Microbiology, 12(5), 327–340. https://doi.org/10.1038/nrmicro3238
      • Clerc, E. E., Raina, J.-B., Keegstra, J. M., Landry, Z., Pontrelli, S., Alcolombri, U., Lambert, B. S., Anelli, V., Vincent, F., Masdeu-Navarro, M., Sichert, A., De Schaetzen, F., Sauer, U., Simó, R., Hehemann, J.-H., Vardi, A., Seymour, J. R., & Stocker, R. (2023). Strong chemotaxis by marine bacteria towards polysaccharides is enhanced by the abundant organosulfur compound DMSP. Nature Communications, 14(1), 8080. https://doi.org/10.1038/s41467-023-43143z
      • Dal Co, A., van Vliet, S., Kiviet, D. J., Schlegel, S., & Ackermann, M. (2020). Shortrange interactions govern the dynamics and functions of microbial communities. Nature Ecology and Evolution, 4(3), 366–375. https://doi.org/10.1038/s41559-019-1080-2
      • D’Souza, G., Ebrahimi, A., Stubbusch, A., Daniels, M., Keegstra, J., Stocker, R., Cordero, O., & Ackermann, M. (2023). Cell aggregation is associated with enzyme secretion strategies in marine polysaccharide-degrading bacteria. The ISME Journal. https://doi.org/10.1038/s41396-023-01385-1
      • D’Souza, G. G., Povolo, V. R., Keegstra, J. M., Stocker, R., & Ackermann, M. (2021). Nutrient complexity triggers transitions between solitary and colonial growth in bacterial populations. The ISME Journal, 15(9), 2614–2626. https://doi.org/10.1038/s41396-021-00953-7
      • D’Souza, G., Schwartzman, J., Keegstra, J., Schreier, J. E., Daniels, M., Cordero, O. X., Stocker, R., & Ackermann, M. (2023). Interspecies interactions determine growth dynamics of biopolymer-degrading populations in microbial communities. Proceedings of the National Academy of Sciences of the United States of America, 120(44), e2305198120. https://doi.org/10.1073/pnas.2305198120
      • Fenchel, T. (2002). Microbial Behavior in a Heterogeneous World. Science, 296(5570), 1068–1071. https://doi.org/10.1126/science.1070118
      • Jiao, N., Luo, T., Chen, Q., Zhao, Z., Xiao, X., Liu, J., Jian, Z., Xie, S., Thomas, H., Herndl, G. J., Benner, R., Gonsior, M., Chen, F., Cai, W.-J., & Robinson, C. (2024). The microbial carbon pump and climate change. Nature Reviews Microbiology. https://doi.org/10.1038/s41579-024-01018-0
      • Keegstra, J. M., Carrara, F., & Stocker, R. (2022). The ecological roles of bacterial chemotaxis. Nature Reviews Microbiology, 20(8), 491–504. https://doi.org/10.1038/s41579-022-00709-w
      • Konishi, H., Hio, M., Kobayashi, M., Takase, R., & Hashimoto, W. (2020). Bacterial chemotaxis towards polysaccharide pectin by pectin-binding protein. Scientific Reports, 10(1), 3977. https://doi.org/10.1038/s41598-020-60274-1
      • Li, Y., Sun, H., Ma, X., Lu, A., Lux, R., Zusman, D., & Shi, W. (2003). Extracellular polysaccharides mediate pilus retraction during social motility of Myxococcus xanthus. Proceedings of the National Academy of Sciences, 100(9), 5443–5448. https://doi.org/10.1073/pnas.0836639100
      • Martínez-Antonio, A., Janga, S. C., Salgado, H., & Collado-Vides, J. (2006). Internal sensing machinery directs the activity of the regulatory network in Escherichia coli. Trends in Microbiology, 14(1), 22–27. https://doi.org/10.1016/j.tim.2005.11.002
      • McDougald, D., Rice, S. A., Barraud, N., Steinberg, P. D., & Kjelleberg, S. (2012). Should we stay or should we go: Mechanisms and ecological consequences for biofilm dispersal. Nature Reviews Microbiology, 10(1), 39–50. https://doi.org/10.1038/nrmicro2695
      • Nguyen, T. T. H., Zakem, E. J., Ebrahimi, A., Schwartzman, J., Caglar, T., Amarnath, K., Alcolombri, U., Peaudecerf, F. J., Hwa, T., Stocker, R., Cordero, O. X., & Levine, N. M. (2022). Microbes contribute to setting the ocean carbon flux by altering the fate of sinking particulates. Nature Communications, 13(1), 1657. https://doi.org/10.1038/s41467-022-29297-2
      • Norris, N., Alcolombri, U., Keegstra, J. M., Yawata, Y., Menolascina, F., Frazzoli, E., Levine, N. M., Fernandez, V. I., & Stocker, R. (2022). Bacterial chemotaxis to saccharides is governed by a trade-off between sensing and uptake. Biophysical Journal, 121(11), 2046–2059. https://doi.org/10.1016/j.bpj.2022.05.003
      • Povolo, V. R., D’Souza, G. G., Kaczmarczyk, A., Stubbusch, A. K., Jenal, U., & Ackermann, M. (2022). Extracellular appendages govern spatial dynamics and growth of Caulobacter crescentus on a prevalent biopolymer. bioRxiv, 2022.06.13.495907. https://doi.org/10.1101/2022.06.13.495907
      • Preheim, S. P., Boucher, Y., Wildschutte, H., David, L. A., Veneziano, D., Alm, E. J., & Polz, M. F. (2011). Metapopulation structure of Vibrionaceae among coastal marine invertebrates. Environmental Microbiology, 13(1), 265–275. https://doi.org/10.1111/j.1462-2920.2010.02328.x
      • Schwartzman, J. A., Ebrahimi, A., Chadwick, G., Sato, Y., Orphan, V., & Cordero, O. X. (2021). Bacterial growth in multicellular aggregates leads to the emergence of complex lifecycles. bioRxiv, 2021.11.01.466752. https://doi.org/10.1101/2021.11.01.466752
      • Singh, P. K., Bartalomej, S., Hartmann, R., Jeckel, H., Vidakovic, L., Nadell, C. D., & Drescher, K. (2017). Vibrio cholerae Combines Individual and Collective Sensing to Trigger Biofilm Dispersal. Current Biology, 27(21), 3359-3366.e7. https://doi.org/10.1016/j.cub.2017.09.041
      • Ulrich, L. E., Koonin, E. V., & Zhulin, I. B. (2005). One-component systems dominate signal transduction in prokaryotes. Trends in Microbiology, 13(2), 52–56. https://doi.org/10.1016/j.tim.2004.12.006
      • Wall, M. E., Hlavacek, W. S., & Savageau, M. A. (2004). Design of gene circuits: Lessons from bacteria. Nature Reviews Genetics, 5(1), 34–42. https://doi.org/10.1038/nrg1244
      • Yawata, Y., Carrara, F., Menolascina, F., & Stocker, R. (2020). Constrained optimal foraging by marine bacterioplankton on particulate organic matter. Proceedings of the National Academy of Sciences, 117(41), 25571–25579. https://doi.org/10.1073/pnas.2012443117
      • Yawata, Y., Cordero, O. X., Menolascina, F., Hehemann, J.-H., Polz, M. F., & Stocker, R. (2014). Competition–dispersal tradeoff ecologically differentiates recently speciated marine bacterioplankton populations. Proceedings of the National Academy of Sciences, 111(15), 5622–5627. https://doi.org/10.1073/pnas.1318943111
      • Zöttl, A., & Yeomans, J. M. (2019). Enhanced bacterial swimming speeds in macromolecular polymer solutions. Nature Physics, 15(6), 554–558. https://doi.org/10.1038/s41567-019-0454-3
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors aimed to investigate the contribution of antigenic drift in the HA and NA genes of seasonal influenza A(H3N2) virus to their epidemic dynamics. Analyzing 22 influenza seasons before the COVID-19 pandemic, the study explored various antigenic and genetic markers, comparing them against indicators characterizing the epidemiology of annual outbreaks. The central findings highlight the significant influence of genetic distance on A(H3N2) virus epidemiology and emphasize the role of A(H1N1) virus incidence in shaping A(H3N2) epidemics, suggesting subtype interference as a key factor. 

      Major Strengths: 

      The paper is well-organized, written with clarity, and presents a comprehensive analysis. The study design, incorporating a span of 22 seasons, provides a robust foundation for understanding influenza dynamics. The inclusion of diverse antigenic and genetic markers enhances the depth of the investigation, and the exploration of subtype interference adds valuable insights. 

      Major Weaknesses: 

      While the analysis is thorough, some aspects require deeper interpretation, particularly in the discussion of certain results. Clarity and depth could be improved in the presentation of findings. Furthermore, the evolving dynamics of H3N2 predominance post-2009 need better elucidation.  

      Reviewer #2 (Public Review): 

      Summary: This paper aims to achieve a better understanding of how the antigenic or genetic compositions of the dominant influenza A viruses in circulation at a given time are related to key features of seasonal influenza epidemics in the US. To this end, the authors analyze an extensive dataset with a range of statistical, data science and machine learning methods. They find that the key drivers of influenza A epidemiological dynamics are interference between influenza A subtypes and genetic divergence, relative to the previous one or two seasons, in a broader range of antigenically related sites than previously thought. 

      Strengths: A thorough investigation of a large and complex dataset. 

      Weaknesses: The dataset covers a 21 year period which is substantial by epidemiological standards, but quite small from a statistical or machine learning perspective. In particular, it was not possible to follow the usual process and test predictive performance of the random forest model with an independent dataset. 

      Reviewer #3 (Public Review): 

      Summary: 

      This paper explores the relationships among evolutionary and epidemiological quantities in influenza, using a wide range of datasets and features, and using both correlations and random forests to examine, primarily, what are the drivers of influenza epidemics. It's a strong paper representing a thorough and fascinating exploration of potential drivers, and it makes a trove of relevant data readily available to the community. 

      Strengths: 

      This paper makes links between epidemiological and evolutionary data for influenza. Placing each in the context of the other is crucial for understanding influenza dynamics and evolution and this paper does a thorough job of this, with many analyses and nuances. The results on the extent to which evolutionary factors relate to epidemic burden, and on interference among influenza types, are particularly interesting. The github repository associated with the paper is clear, comprehensive, and well-documented. 

      Weaknesses: 

      The format of the results section can be hard to follow, and we suggest improving readability by restructuring and simplifying in some areas. There are a range of choices made about data preparation and scaling; the authors could explore sensitivity of the results to some of these. 

      Response to public reviews

      We appreciate the positive comments from the reviewers and have implemented or responded to all of the reviewers’ recommendations.

      In response to Reviewer 1, we expand on the potential drivers and biological implications of the findings pointed out in their specific recommendations. For example, we now explicitly mention that antigenically distinct 3c.2a and 3c.3a viruses began to co-circulate in 2012 and underwent further diversification during subsequent seasons in our study. We note that, after the 2009 A(H1N1) pandemic, the mean fraction of influenza positive cases typed as A(H3N2) in A(H3N2) dominant seasons is lower compared to A(H3N2) dominant seasons prior to 2009. We propose that the weakening of A(H3N2) predominance may be linked to the diversification of A(H3N2) viruses during the 2010s, wherein multiple antigenically distinct clades with similar fitness circulated in each season, as opposed to a single variant with high fitness.

      In response to Reviewer 2, we agree that it would be ideal and best practice to measure model performance with an independent test set, but our dataset includes only ~20 seasons. Predictions of independent test sets of 2-3 seasons had unstable performance, which indicates we do not have sufficient power to measure model performance with a test set this small. In the revised manuscript, we provide more justification and clarification of our methodology. Instead of testing model performance on an independent test set, we use leave-one-season-out cross-validation to train models and measure model performance, wherein each “assessment” set contains one season of data (predicted by the model), and the corresponding “analysis” set (“fold”) contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of the model (Kuhn & Johnson, 2019).

      In response to Reviewer 3, we follow the reviewer’s advice to put the Methods section before the Results section. Concerning Reviewer 3’s question about the sensitivity of our results to data preparation and rescaling, we provide more justification and clarification of our methodology in the revised manuscript. In our study, we adjust influenza type/subtype incidences for differences in reporting between the pre- and post-2009 pandemic periods and across HHS regions. We adjust for differences in reporting between the pre- and post-2009 periods because the US CDC and WHO increased laboratory testing capacity in response to the 2009 A(H1N1) pandemic, which led to substantial, long-lasting improvements to influenza surveillance that are still in place today. Figure 1 - figure supplement 2 shows systematic increases in influenza test volume in all HHS regions after the 2009 pandemic. Given the substantial increase in test volume after 2009, we opted to keep the time trend adjustment for the pre- and post-2009 pandemic periods and evaluate whether adjusting for regional reporting differences affects our results. When estimating univariate correlations between various A(H3N2) epidemic metrics and evolutionary indicators, we found qualitatively equivalent results when adjusting for both pre- and post-2009 pandemic reporting and regional reporting versus only adjusting for the pre- and post-2009 pandemic reporting.

      Reviewer #1 (Recommendations For The Authors): 

      Specific comments: 

      (1) Line 155-156. Request for a reference for: "Given that protective immunity wanes after 1-4 years" 

      We now include two references (He et al. 2015 and Wraith et al. 2022), which were cited at the beginning of the introduction when referring to the duration of protective immunity for antigenically homologous viruses. (Lines 640-642 in revised manuscript)

      (2) Line 162-163: Request a further explanation of the negative correlation between seasonal diversity of HA and NA LBI values and NA epitope distance. Clarify biological implications to aid reader understanding. 

      In the revised manuscript we expand on the biological implications of A(H3N2) virus populations characterized by high antigenic novelty and low LBI diversity.

      Lines 649-653:

      “The seasonal diversity of HA and NA LBI values was negatively correlated with NA epitope distance (Figure 2 – figure supplements 5 – 6), with high antigenic novelty coinciding with low genealogical diversity. This association suggests that selective sweeps tend to follow the emergence of drifted variants with high fitness, resulting in seasons dominated by a single A(H3N2) variant rather than multiple cocirculating clades.”

      (3) Figure S3 legend t-2 may be marked as t-1. 

      Thank you for catching this. We have fixed this typo. Note: Figure S3 is now Figure 2 – figure supplement 5.

      (4) Lines 201-214. The key takeaways from the analysis of subtype dominance are ultimately not clear. It also misses the underlying dynamics that H3N2 predominance following an evolutionary change has waned since 2009.

      In the revised manuscript we elaborate on key takeaways concerning the relationship between antigenic drift and A(H3N2) dominance. We also add a caveat noting that A(H3N2) predominance is weaker during the post-2009 period, which may be linked to the diversification of A(H3N2) lineages after 2012. We do not know of a reference that links the diversification of A(H3N2) viruses in the 2010s to a particular evolutionary change. Therefore, we do not attribute the diversification of A(H3N2) viruses to a specific evolutionary change in A(H3N2) variants circulating at the time (A/Perth/16/2009-like strains (PE09)). Instead, we allude to the potential role of A(H3N2) diversification in creating multiple co-circulating lineages that may have less of a fitness advantage.

      Lines 681-703:

      “We explored whether evolutionary changes in A(H3N2) may predispose this subtype to dominate influenza virus circulation in a given season. A(H3N2) subtype dominance – the proportion of influenza positive samples typed as A(H3N2) – increased with H3 epitope distance (t – 2) (R2 = 0.32, P = 0.05) and N2 epitope distance (t – 1) (R2 = 0.34, P = 0.03) (regression results: Figure 4; Spearman correlations: Figure 3 – figure supplement 1). Figure 4 illustrates this relationship at the regional level across two seasons in which A(H3N2) was nationally dominant, but where antigenic change differed. In 2003-2004, we observed widespread dominance of A(H3N2) viruses after the emergence of the novel antigenic cluster, FU02 (A/Fujian/411/2002-like strains). In contrast, there was substantial regional heterogeneity in subtype circulation during 2007-2008, a season in which A(H3N2) viruses were antigenically similar to those circulating in the previous season. Patterns in type/subtype circulation across all influenza seasons in our study period are shown in Figure 4 – figure supplement 1. As observed for the 2003-2004 season, widespread A(H3N2) dominance tended to coincide with major antigenic transitions (e.g.,

      A/Sydney/5/1997 (SY97) seasons, 1997-1998 to 1999-2000; A/California/7/2004 (CA04) season, 20042005), though this was not universally the case (e.g., A/Perth/16/2009 (PE09) season, 2010-2011). 

      After the 2009 A(H1N1) pandemic, A(H3N2) dominant seasons still occurred more frequently than A(H1N1) dominant seasons, but the mean fraction of influenza positive cases typed as A(H3N2) in A(H3N2) dominant seasons was lower compared to A(H3N2) dominant seasons prior to 2009. Antigenically distinct 3c.2a and 3c.3a viruses began to co-circulate in 2012 and underwent further diversification during subsequent seasons in our study (https://nextstrain.org/seasonal-

      flu/h3n2/ha/12y@2024-05-13) (Dhanasekaran et al., 2022; Huddleston et al., 2020; Yan et al., 2019). The decline in A(H3N2) predominance during the post-2009 period may be linked to the genetic and antigenic diversification of A(H3N2) viruses, wherein multiple lineages with similar fitness co-circulated in each season.”

      (5) Line 253-255: It would be beneficial to provide a more detailed interpretation of the statement that "pre-2009 seasonal A(H1N1) viruses may limit the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses." Elaborate on the cause-and-effect relationship within this statement.

      In the revised manuscript we suggest that seasonal A(H1N1) viruses may interfere with the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses, because seasonal A(H1N1) viruses and A(H3N2) are more closely related, and thus may elicit stronger cross-reactive T cell responses.

      Lines 738-745:

      “The internal gene segments NS, M, NP, PA, and PB2 of A(H3N2) viruses and pre-2009 seasonal A(H1N1) viruses share a common ancestor (Webster et al., 1992) whereas A(H1N1)pdm09 viruses have a combination of gene segments derived from swine and avian reservoirs that were not reported prior to the 2009 pandemic (Garten et al., 2009; Smith et al., 2009). Non-glycoprotein genes are highly conserved between influenza A viruses and elicit cross-reactive antibody and T cell responses (Grebe et al., 2008; Sridhar, 2016). Because pre-2009 seasonal A(H1N1) viruses and A(H3N2) are more closely related, we hypothesized that seasonal A(H1N1) viruses could potentially limit the circulation of A(H3N2) viruses to a greater extent than A(H1N1)pdm09 viruses, due to greater T cell-mediated cross-protective immunity.”

      (6) In the results section, many statements report statistical results of correlation analyses. Consider providing further interpretations of these results, such as the implications of nonsignificant correlations and how they support or contradict the hypothesis or previous studies. For example, the statement on line 248 regarding the lack of significant correlation between influenza B epidemic size and A(H3N2) epidemic metrics would benefit from additional discussion on what this non-significant correlation signifies and how it relates to the hypothesis or previous research. 

      In the Discussion section, we suggest that the lack of an association between influenza B circulation and A(H3N2) epidemic metrics is due to few T and B cell epitopes shared between influenza A and B viruses (Terajima et al., 2013).

      Lines 1005-1007 in revised manuscript (Lines 513-515 in original manuscript): 

      “Overall, we did not find any indication that influenza B incidence affects A(H3N2) epidemic burden or timing, which is not unexpected, given that few T and B cell epitopes are shared between the two virus types (Terajima et al., 2013).”

      Minor comments: 

      (1) Line 116-122: Include a summary statistical description of all collected data sets, detailing the number of HA and NA sequence data and their sources. Briefly describe subsampled data sets, specifying preferences (e.g., the number of HA or NA sequence data collected from each region). 

      In our revised manuscript we now include supplementary tables that summarize the number of A/H3 and

      A/N2 sequences in each subsampled dataset, aggregated by world region, for all seasons combined (Figure 2 - table supplements 1 - 2). We also include supplementary figures showing the number of sequences collected in each month and each season in North America versus the other nine world regions combined (Figure 2 - figure supplements 1 - 2). Subsampled datasets are plotted individually in the figures below but individual time series are difficult to discern due to minor differences in sequence counts across the datasets.

      (2) Figure 7A: Due to space limitations, consider rounding numbers on the x-axis to whole numbers for clarity. 

      Thank you for this suggestion. In the revised manuscript we round numbers in the axes of Figure 7A (Figure 9A in the revised manuscript) so that the axes are less crowded.

      (3) Figure 4C & Figure 4D: Note that Region 10 (purple) data were unavailable for seasons before 2009 (lines 1483-1484). Label each region on the map with its respective region number (1 to 10) and indicate this in the legend for easy identification. 

      In our original submission, the legend for Figure 4 included “Data for Region 10 (purple) were not available for seasons prior to 2009” at the end of the caption. We have moved this sentence, as well as other descriptions that apply to both C and D, so that they follow the sentence “C-D. Regional patterns of influenza type and subtype incidence during two seasons when A(H3N2) was nationally dominant.”

      In our revised manuscript, Figure 4, and Figure 4 - figure supplement 1 (Figure S10 in original submission) include labels for each HHS region.

      We did not receive specific recommendations from Reviewer #2. However, our responses to Reviewer #3 addresses the study’s weaknesses mentioned by Reviewer #2.

      Reviewer #3 (Recommendations For The Authors): 

      This paper explores the relationships among evolutionary and epidemiological quantities in influenza, using a wide range of datasets and features, and using both correlations and random forests to examine, primarily, what are the drivers of influenza epidemics. 

      This is a work horse of paper, in the volumes of data that are analyzed and the extensive analysis that is done. The data that are provided are a treasure trove resource for influenza modelers and for anyone interested in seeing influenza surveillance data in the context of evolution, and evolutionary information in the context of epidemiology. 

      L53 - end of sentence "and antigenic drift": not sure this fits, explain? I thought this sentence was in contrast to antigenic drift.

      Thank you for catching this. We did not intend to include “and antigenic drift” at the end of this sentence and have removed it (Line 59).

      Para around L115: would using primarily US data be a limitation, because it's global immunity that shapes success of strains? Or, how much does each country's immunity and vaccination and so on actually shape what strains succeed there, compared to global/international factors? 

      The HA and NA phylogenetic trees in our study are enriched with US sequences because our study focuses on epidemiological dynamics in the US, and we wanted to prioritize A(H3N2) viruses that the US human population encountered in each season. We agree with the reviewer that the world population may be the right scale to understand how immunity, acquired by vaccination or natural infection, may shape the emergence and success of new lineages that will go on to circulate globally. However, our study assesses the overall impact of antigenic drift on regional A(H3N2) epidemic dynamics in the US. In other words, our driving question is whether we can predict the population-level impact of an A(H3N2) variant in the US, conditional on this particular lineage having established in the US and circulating at relatively high levels. We do not assess the global or population-level factors that may influence which A(H3N2) virus lineages are successful in a given location or season.

      We have added a clarifying sentence to the end of the Introduction to narrow the scope of the paper for the reader. 

      Line 114-116: “Rather than characterize in situ evolution of A(H3N2) lineages circulating in the U.S., we study the epidemiological impacts of antigenic drift once A(H3N2) variants have arrived on U.S. soil and managed to establish and circulate at relatively high levels.”

      In the Results section, I found the format hard to follow, because of the extensive methodological details, numbers with CIs and long sentences. Sentences sometimes included the question, definitions of variables, and lists. For example at line 215 we have: "Next, we tested for associations between A(H3N2) evolution and epidemic timing, including onset week, defined as the winter changepoint in incidence [16], and peak week, defined as the first week of maximum incidence; spatiotemporal synchrony, measured as the variation (standard deviation, s.d.) in regional onset and peak timing; and epidemic speed, including seasonal duration and the number of weeks from onset to peak (Table 2, Figure S11)". I would suggest putting the methods section first, using shorter sentences, separating lists from the question being asked, and stating what was found without also putting in all the extra detail. Putting the methods section before the results might reduce the sense that you have to explain what you did and how in the results section too.

      Thank you for suggesting how to improve the readability of the Results section. In the revised manuscript, we follow the reviewer’s advice to put the Methods section before the Results section. Although eLife formatting requirements specify the order: Introduction, Results, Discussion, and Methods, the journal allows for the Methods section to follow the Introduction when it makes sense to do so. We agree with the reviewer that putting the Methods section before the Results section makes our results easier to follow because we no longer need to introduce methodological details at the beginning of each set of results.

      L285 in the RF you remove variables without significant correlations with the target variables, but isn't one of the aims of RF to uncover relationships where a correlation might not be evident, and in part to reveal combinations of features that give the targeted outcome? Also with the RF, I am a bit concerned that you could not use the leave-one-out approach because it was "unstable" - presumably that means that you obtain quite different results if you leave out a season. How robust are these results, and what are the most sensitive aspects? Are the same variables typically high in importance if you leave out a season, for example? What does the scatterplot of observed vs predicted epidemic size (as in Fig 7) look like if each prediction is for the one that was left out (i.e. from a model trained on all the rest)? In my experience, where the RF is "unstable", that can look pretty terrible even if the model trained on all the data looks great (as does Figure 7). In any case I think it's worth discussing sensitivity.

      (1) In response to the reviewer’s first question, we explain our rationale for not including all candidate predictors in random forest and penalized regression models. 

      Models trained with different combinations of predictors can have similar performance, and these combinations of predictors can include variables that do not necessarily have strong univariate associations with the target variable. The performance of random forest and LASSO regression models are not sensitive to redundant or irrelevant predictors (see Figure 10.2 in Kuhn & Johnson, 2019). However,  if our goal is variable selection rather than strictly model performance, it is considered best practice to remove collinear, redundant, and/or irrelevant variables prior to training models (see section 11.3 in Kuhn & Johnson, 2019). In both random forest and LASSO regression models, if there are highly collinear variables that are useful for predicting the target variable, the predictor chosen by the model becomes a random selection. In random forest models, these highly collinear variables will be used in all splits across the forest of decision trees, and this redundancy dilutes variable importance scores. Thus, failing to minimize multicollinearity prior to model training could result in some variables having low rankings and the appearance of being unimportant, because their importance scores are overshadowed by those of the highly correlated variables. Our rationale for preprocessing predictor data follows the philosophy of Kuhn & Johnson, 2019, who recommend including the minimum possible set of variables that does not compromise model performance. Even if a particular model is insensitive to extra predictors, Kuhn and John explain that “removing predictors can reduce the cost of acquiring data or improve the throughput of the software used to make predictions.”

      In the revised manuscript, we include more details about our steps for preprocessing predictor data. We also follow the reviewer’s suggestion to include all evolutionary predictors in variable selection analyses, regardless of whether they have strong univariate correlations with target outcomes, because the performance of random forest and LASSO regression models is not affected by redundant predictors. 

      Including additional predictors in our variable selection analyses does not change our conclusions. As reported in our original manuscript, predictors with strong univariate correlations with various epidemic metrics were the highest ranked features in both random forest and LASSO regression models.

      Lines 523-563:

      “Preprocessing of predictor data: The starting set of candidate predictors included all viral fitness metrics: genetic and antigenic distances between current and previously circulating strains and the standard deviation and Shannon diversity of H3 and N2 LBI values in the current season. To account for potential type or subtype interference, we included A(H1N1) or A(H1N1)pdm09 epidemic size and B epidemic size in the current and prior season and the dominant IAV subtype in the prior season (Lee et al., 2018). We included A(H3N2) epidemic size in the prior season as a proxy for prior natural immunity to A(H3N2). To account for vaccine-induced immunity, we considered four categories of predictors and included estimates for the current and prior seasons: national vaccination coverage among adults (18-49 years coverage × ≥ 65 years coverage), adjusted A(H3N2) vaccine effectiveness (VE), a combined metric of vaccination coverage and A(H3N2) VE (18-49 years coverage × ≥ 65 years coverage × VE), and H3 and N2 epitope distances between naturally circulating A(H3N2) viruses and the U.S. A(H3N2) vaccine strain in each season. We could not include a predictor for vaccination coverage in children or consider cladespecific VE estimates, because these data were not available for most seasons in our study.

      Random forest and LASSO regression models are not sensitive to redundant (highly collinear) features (Kuhn & Johnson, 2019), but we chose to downsize the original set of candidate predictors to minimize the impact of multicollinearity on variable importance scores. For both types of models, if there are highly collinear variables that are useful for predicting the target variable, the predictor chosen by the model becomes a random selection (Kuhn & Johnson, 2019). In random forest models, these highly collinear variables will be used in all splits across the forest of decision trees, and this redundancy dilutes variable importance scores (Kuhn & Johnson, 2019). We first confirmed that none of the candidate predictors had zero variance or near-zero variance. Because seasonal lags of each viral fitness metric are highly collinear, we included only one lag of each evolutionary predictor, with a preference for the lag that had the strongest univariate correlations with various epidemic metrics. We checked for multicollinearity among the remaining predictors by examining Spearman’s rank correlation coefficients between all pairs of predictors. If a particular pair of predictors was highly correlated (Spearman’s 𝜌 > 0.8), we retained only one predictor from that pair, with a preference for the predictor that had the strongest univariate correlations with various epidemic metrics. Lastly, we performed QR decomposition of the matrix of remaining predictors to determine if the matrix is full rank and identify sets of columns involved in linear dependencies. This step did not eliminate any additional predictors, given that we had already removed pairs of highly collinear variables based on Spearman correlation coefficients. 

      After these preprocessing steps, our final set of model predictors included 21 variables, including 8 viral evolutionary indicators: H3 epitope distance (t – 2), HI log2 titer distance (t – 2), H3 RBS distance (t – 2), H3 non-epitope distance (t – 2), N2 epitope distance (t – 1), N2 non-epitope distance (t – 1), and H3 and N2 LBI diversity (s.d.) in the current season; 6 proxies for type/subtype interference and prior immunity:

      A(H1N1) and B epidemic sizes in the current and prior season, A(H3N2) epidemic size in the prior season, and the dominant IAV subtype in the prior season; and 7 proxies for vaccine-induced immunity: A(H3N2) VE in the current and prior season, H3 and N2 epitope distances between circulating strains and the vaccine strain in each season, the combined metric of adult vaccination coverage × VE in the current and prior season, and adult vaccination coverage in the prior season.”

      (2) Next, we clarify our model training methodology to address the reviewer’s second point about using a leave-one-out cross-validation approach.

      We believe the reviewer is mistaken; we use a leave-one-season-out validation approach which lends some robustness to the predictions. In our original submission, we stated “We created each forest by generating 3,000 regression trees from 10 repeats of a leave-one-season-out (jackknife) cross-validated sample of the data. Due to the small size of our dataset, evaluating the predictive accuracy of random forest models on a quasi-independent test set produced unstable estimates.” (Lines 813-816 in the original manuscript)

      To clarify, we use leave-one-season-out cross-validation to train models and measure model performance, wherein each “assessment” set contains one season of data (predicted by the model), and the corresponding “analysis” set (“fold”) contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of the model (see Section 3.4 in Kuhn & Johnson, 2019). To reduce noise, we generated 10 bootstrap resamples of each fold and averaged the RMSE and R2 values of model predictions from resamples. 

      Although it would be ideal and best practice to measure model performance with an independent test set, our dataset includes only ~20 seasons. We found that predictions of independent test sets of 2-3 seasons had unstable performance, which indicates we do not have sufficient power to measure model performance with a test set this small. Further, we suspect that large antigenic jumps in a small subset of seasons further contribute to variation in prediction accuracy across randomly selected test sets. Our rationale for using cross-validation instead of an independent test set is best described in Section 4.3 of Kuhn and Johnson’s book “Applied Predictive Modeling” (Kuhn & Johnson, 2013):

      “When the number of samples is not large, a strong case can be made that a test set should be avoided because every sample may be needed for model building. Additionally, the size of the test set may not have sufficient power or precision to make reasonable judgements. Several researchers (Molinaro 2005; Martin and Hirschberg 1996; Hawkins et al. 2003) show that validation using a single test set can be a poor choice. Hawkins et al. (2003) concisely summarize this point: “holdout samples of tolerable size [...] do not match the cross-validation itself for reliability in assessing model fit and are hard to motivate. “Resampling methods, such as cross-validation, can be used to produce appropriate estimates of model performance using the training set. These are discussed in length in Sect.4.4. Although resampling techniques can be misapplied, such as the example shown in Ambroise and McLachlan (2002), they often produce performance estimates superior to a single test set because they evaluate many alternate versions of the data.”

      In our revised manuscript, we provide additional clarification of our methods (Lines 574-590):

      “We created each forest by generating 3,000 regression trees. To determine the best performing model for each epidemic metric, we used leave-one-season-out (jackknife) cross-validation to train models and measure model performance, wherein each “assessment” set is one season of data predicted by the model, and the corresponding “analysis” set contains the remaining seasons. This approach is roughly analogous to splitting data into training and test sets, but all seasons are used at some point in the training of each model (Kuhn & Johnson, 2019). Due to the small size of our dataset (~20 seasons), evaluating the predictive accuracy of random forest models on a quasi-independent test set of 2-3 seasons produced unstable estimates. Instead of testing model performance on an independent test set, we generated 10 bootstrap resamples (“repeats”) of each analysis set (“fold”) and averaged the predictions of models trained on resamples (Kuhn & Johnson, 2013, 2019). For each epidemic metric, we report the mean root mean squared error (RMSE) and R2 of predictions from the best tuned model. We used permutation importance (N = 50 permutations) to estimate the relative importance of each predictor in determining target outcomes. Permutation importance is the decrease in prediction accuracy when a single feature (predictor) is randomly permuted, with larger values indicating more important variables. Because many features were collinear, we used conditional permutation importance to compute feature importance scores, rather than the standard marginal procedure (Altmann et al., 2010; Debeer & Strobl, 2020; Strobl et al., 2008; Strobl et al., 2007).”

      (3) In response to the reviewer’s question about the sensitivity of results when one season is left out, we clarify that the variable importance scores in Figure 8 and model predictions in Figure 9 were generated by models tuned using leave-one-season-out cross-validation. 

      As explained above, in our leave-one-season-out cross-validation approach, each “assessment” set contains one season of data predicted by the model, and the corresponding “analysis” set (“fold”) contains the remaining seasons. We generated predictions of epidemic metrics and variable importance rankings by averaging the model output of 10 bootstrap resamples of each cross-validation fold. 

      In Lines 791-806, we describe which epidemic metrics have the highest prediction accuracy and report that random forest models tend to underpredict most epidemic metrics in seasons with high antigenic novelty:

      “We measured correlations between observed values and model-predicted values at the HHS region level. Among the various epidemic metrics, random forest models produced the most accurate predictions of A(H3N2) subtype dominance (Spearman’s 𝜌 = 0.95, regional range = 0.85 – 0.97), peak incidence (𝜌 = 0.91, regional range = 0.72 – 0.95), and epidemic size (𝜌 = 0.9, regional range = 0.74 – 0.95), while predictions of effective 𝑅! and epidemic intensity were less accurate (𝜌 = 0.81, regional range = 0.65 – 0.91; 𝜌 = 0.78, regional range = 0.63 – 0.92, respectively) (Figure 9). Random forest models tended to underpredict most epidemic targets in seasons with substantial H3 antigenic transitions, in particular the SY97 cluster seasons (1998-1999, 1999-2000) and the FU02 cluster season (2003-2004) (Figure 9). 

      For epidemic size and peak incidence, seasonal predictive error – the root-mean-square error (RMSE) across all regional predictions in a season – increased with H3 epitope distance (epidemic size, Spearman’s 𝜌 = 0.51, P = 0.02; peak incidence, 𝜌 = 0.63, P = 0.004) and N2 epitope distance (epidemic size, 𝜌 = 0.48, P = 0.04; peak incidence, 𝜌 = 0.48, P = 0.03) (Figure 9 – figure supplements 1 – 2). For models of epidemic intensity, seasonal RMSE increased with N2 epitope distance (𝜌 = 0.64, P = 0.004) but not H3 epitope distance (𝜌 = 0.06, P = 0.8) (Figure 9 – figure supplements 1 – 2). Seasonal RMSE of effective 𝑅! and subtype dominance predictions did not correlate with H3 or N2 epitope distance (Figure 9 – figure supplements 1 – 2).”

      I think the competition (interference) results are really interesting, perhaps among the most interesting aspects of this work. 

      Thank you! We agree that our finding that subtype interference has a greater impact than viral evolution on A(H3N2) epidemics is one of the more interesting results in the study.

      Have you seen the paper by Barrat-Charlaix et al? They found that LBI was not good predicting frequency dynamics (see https://pubmed.ncbi.nlm.nih.gov/33749787/); instead, LBI was high for sequences like the consensus sequence, which was near to future strains. LBI also was not positively correlated with epidemic impact in Figure S7.

      The local branching index (LBI) measures the rate of recent phylogenetic branching and approximates relative fitness among viral clades, with high LBI values representing greater fitness (Neher et al. 2014).

      Two of this study’s co-authors (John Huddleston and Trevor Bedford) are also co-authors of BarratCharlaix et al. 2021. Barrat-Charlaix et al. 2021 assessed the performance of LBI in predicting the frequency dynamics and fixation of individual amino acid substitutions in A(H3N2) viruses. Our study is not focused on predicting the future success of A(H3N2) clades or the frequency dynamics or probability of fixation of individual substitutions. Instead, we use the standard deviation and Shannon diversity of LBI values in each season as a proxy for genealogical (clade-level) diversity. We find that, at a seasonal level, low diversity of H3 or N2 LBI values in the current season correlates with greater epidemic intensity, higher transmission rates, and shorter seasonal duration.

      In the Discussion we provide an explanation for these correlation results (Lines 848-857): 

      “The local branching index (LBI) is traditionally used to predict the success of individual clades, with high LBI values indicating high viral fitness (Huddleston et al., 2020; Neher et al., 2014). In our epidemiological analysis, low diversity of H3 or N2 LBI in the current season correlated with greater epidemic intensity, higher transmission rates, and shorter seasonal duration. These associations suggest that low LBI diversity is indicative of a rapid selective sweep by one successful clade, while high LBI diversity is indicative of multiple co-circulating clades with variable seeding and establishment times over the course of an epidemic. A caveat is that LBI estimation is more sensitive to sequence sub-sampling schemes than strain-level measures. If an epidemic is short and intense (e.g., 1-2 months), a phylogenetic tree with our sub-sampling scheme (50 sequences per month) may not incorporate enough sequences to capture the true diversity of LBI values in that season.”

      Figure 1 - LBI goes up over time. Is that partly to do with sampling? Overall how do higher sampling volumes in later years impact this analysis? (though you choose a fixed number of sequences so I guess you downsample to cope with that). I note that LBI is likely to be sensitive to sequencing density. 

      Thank you for pointing this out. We realized that increasing LBI Shannon diversity over the course of the study period was indeed an artefact of increasing sequence volume over time. Our sequence subsampling scheme involves selecting a random sample of up to 50 viruses per month, with up to 25 viruses selected from North America (if available) and the remaining sequences evenly divided across nine other global regions. In early seasons of the study (late 1990s/early 2000s), sampling was often too sparse to meet the 25 viruses/month threshold for North America or for the other global regions combined (H3: Figure 2 - figure supplement 1; N2: Figure 2 - figure supplement 2). Ecological diversity metrics are sensitive to sample size, which explains why LBI Shannon diversity appeared to steadily increase over time in our original submission. In our revised manuscript, we correct for uneven sample sizes across seasons before estimating Shannon diversity and clarify our methodology. 

      Lines 443-482: 

      “Clade growth: The local branching index (LBI) measures the relative fitness of co-circulating clades, with high LBI values indicating recent rapid phylogenetic branching (Huddleston et al., 2020; Neher et al., 2014). To calculate LBI for each H3 and N2 sequence, we applied the LBI heuristic algorithm as originally described by Neher et al., 2014 to H3 and N2 phylogenetic trees, respectively. We set the neighborhood parameter 𝜏 to 0.4 and only considered viruses sampled between the current season 𝑡 and the previous season 𝑡 – 1 as contributing to recent clade growth in the current season 𝑡.  

      Variation in the phylogenetic branching rates of co-circulating A(H3N2) clades may affect the magnitude, intensity, onset, or duration of seasonal epidemics. For example, we expected that seasons dominated by a single variant with high fitness might have different epidemiological dynamics than seasons with multiple co-circulating clades with varying seeding and establishment times. We measured the diversity of clade growth rates of viruses circulating in each season by measuring the standard deviation (s.d.) and Shannon diversity of LBI values in each season. Given that LBI measures relative fitness among cocirculating clades, we did not compare overall clade growth rates (e.g., mean LBI) across seasons.

      Each season’s distribution of LBI values is right-skewed and does not follow a normal distribution. We therefore bootstrapped the LBI values of each season in each replicate dataset 1000 times (1000 samples with replacement) and estimated the seasonal standard deviation of LBI from resamples, rather than directly from observed LBI values. We also tested the seasonal standard deviation of LBI from log transformed LBI values, which produced qualitatively equivalent results to bootstrapped LBI values in downstream analyses.

      As an alternative measure of seasonal LBI diversity, we binned raw H3 and N2 LBI values into categories based on their integer values (e.g., an LBI value of 0.5 is assigned to the (0,1] bin) and estimated the exponential of the Shannon entropy (Shannon diversity) of LBI categories (Hill, 1973; Shannon, 1948). The Shannon diversity of LBI considers both the richness and relative abundance of viral clades with different growth rates in each season and is calculated as follows:  

      where 𝑞 𝐷 is the effective number of categories or Hill numbers of order 𝑞 (here, clades with different growth rates), with 𝑞 defining the sensitivity of the true diversity to rare versus abundant categories (Hill,

      1973). exp is the exponential function, 𝑝# is the proportion of LBI values belonging to the 𝑖th category, and 𝑅 is richness (the total number of categories). Shannon diversity 1𝐷 (𝑞 = 1) estimates the effective number of categories in an assemblage using the geometric mean of their proportional abundances 𝑝# (Hill, 1973).  

      Because ecological diversity metrics are sensitive to sampling effort, we rarefied H3 and N2 sequence datasets prior to estimating Shannon diversity so that seasons had the same sample size. For each season in each replicate dataset, we constructed rarefaction and extrapolation curves of LBI Shannon diversity and extracted the Shannon diversity estimate of the sample size that was twice the size of the reference sample size (the smallest number of sequences obtained in any season during the study) (iNEXT R package) (Chao et al., 2014). Chao et al. found that their diversity estimators work well for rarefaction and short-range extrapolation when the extrapolated sample size is up to twice the reference sample size. For H3, we estimated seasonal diversity using replicate datasets subsampled to 360 sequences/season; For N2, datasets were subsampled to 230 sequences/season.”

      Estimating the Shannon diversity of LBI from datasets with even sampling across seasons removes the previous secular trend of increasing LBI diversity over time (Figure 2 in revised manuscript).

      Figure 3 - I wondered what about the co-dominant times? 

      In Figure 3, orange points correspond to seasons in which A(H3N2) and A(H1N1) were codominant. We are not sure of the reviewer’s specific question concerning codominant seasons, but if it concerns whether antigenic drift is linked to epidemic magnitude among codominant seasons alone, we cannot perform separate regression analyses for these seasons because there are only two codominant seasons during the 22 season study period.

      Figure 4 - Related to drift and epidemic size, dominance, etc. -- when is drift measured, and (if it's measured in season t), would larger populations create more drift, simply by having access to more opportunity (via a larger viral population size)? This is a bit 'devil's advocate' but what if some epidemiological/behavioural process causes a larger and/or later peak, and those gave rise to higher drift?

      Seasonal drift is measured as the genetic or antigenic distance between viruses circulating during season t and viruses circulating in the prior season (𝑡 – 1) or two seasons ago (𝑡 – 2).

      Concerning the question about whether larger human populations lead to greater rates of antigenic drift, phylogeographic studies have repeatedly found that East-South-Southeast Asia are the source populations for A(H3N2) viruses (Bedford et al., 2015; Lemey et al., 2014), in part because these regions have tropical or subtropical climates and larger human populations, which enable year-round circulation and higher background infection rates. Larger viral populations (via larger host population sizes) and uninterrupted transmission may increase the efficiency of selection and the probability of strain survival and global spread (Wen et al., 2016). After A(H3N2) variants emerge in East-South-Southeast Asia and spread to other parts of the world, A(H3N2) viruses circulate via overlapping epidemics rather than local persistence (Bedford et al., 2015; Rambaut et al., 2008). Each season, A(H3N2) outbreaks in the US (and other temperate regions) are seeded by case importations from outside the US, genetic diversity peaks during the winter, and a strong genetic bottleneck typically occurs at the end of the season (Rambaut et al., 2008).

      Due to their faster rates of antigenic evolution, A(H3N2) viruses undergo more rapid clade turnover and dissemination than A(H1N1) and B viruses, despite similar global migration networks across A(H3N2), A(H1N1), and B viruses (Bedford et al., 2015). Bedford et al. speculate that there is typically little geographic differentiation in A(H3N2) viruses circulating in each season because A(H3N2) viruses tend to infect adults, and adults are more mobile than children. Compared to A(H3N2) viruses, A(H1N1) and B viruses tend to have greater genealogical diversity, geographic differentiation, and longer local persistence times (Bedford et al., 2015; Rambaut et al., 2008). Thus, some A(H1N1) and B epidemics are reseeded by viruses that have persisted locally since prior epidemics (Bedford et al., 2015).

      Theoretical models have shown that epidemiological processes can influence rates of antigenic evolution (Recker et al., 2007; Wen et al., 2016; Zinder et al., 2013), though the impact of flu epidemiology on viral evolution is likely constrained by the virus’s intrinsic mutation rate. 

      In conclusion, larger host population sizes and flu epidemiology can indeed influence rates of antigenic evolution. However, given that our study is US-centric and focuses on A(H3N2) viruses, these factors are likely not at play in our study, due to intrinsic biological characteristics of A(H3N2) viruses and the geographic location of our study.

      We have added a clarifying sentence to the end of the Introduction to narrow the scope of the paper for the reader.

      Line 114-116: “Rather than characterize in situ evolution of A(H3N2) lineages circulating in the U.S., we study the epidemiological impacts of antigenic drift once A(H3N2) variants have arrived on U.S. soil and managed to establish and circulate at relatively high levels.”

      Methods -- 

      L 620 about rescaling and pre- vs post-pandemic times : tell us more - how has reporting changed? could any of this not be because of reporting but because of NPIs or otherwise? Overall there is a lot of rescaling going on. How sensitive are the results to it? 

      it would be unreasonable to ask for a sensitivity analysis for all the results for all the choices around data preparation, but some idea where there is a reason to think there might be a dependence on one of these choices would be great.

      In response to the 2009 A(H1N1) pandemic, the US CDC and WHO increased laboratory testing capacity and strengthened epidemiological networks, leading to substantial, long-lasting improvements to influenza surveillance that are still in place today (https://www.cdc.gov/flu/weekly/overview.htm). At the beginning of the COVID-19 pandemic, influenza surveillance networks were quickly adapted to detect and understand the spread of SARS-CoV-2. The 2009 pandemic occurred over a time span of less than one year, and strict non-pharmaceutical interventions (NPIs), such as lockdowns and mask mandates, were not implemented. Thus, we attribute increases in test volume during the post-2009 period to improved virologic surveillance and laboratory testing capacity rather than changes in care-seeking behavior. In the revised manuscript, we include a figure (Figure 1 - figure supplement 2) that shows systematic increases in test volume in all HHS regions after the 2009 pandemic.

      Given the substantial increase in influenza test volume after 2009, we opted to keep the time trend adjustment for the pre- and post-2009 pandemic periods and evaluate whether adjusting for regional reporting differences affects our results. When estimating univariate correlations between various

      A(H3N2) epidemic metrics and evolutionary indicators, we found qualitatively equivalent results for Spearman correlations and regression models, when adjusting for the pre- and post-2009 pandemic time periods and regional reporting versus only adjusting for the pre-/post-2009 pandemic time periods. Below, we share adjusted versions of Figure 3 (regression results) and Figure 3 - figure supplement 1 (Spearman correlations). Each figure only adjusts for differences in pre- and post-2009 pandemic reporting.

      Author response image 1.

      Adjustment for pre- and post-2009 pandemic only

      Author response image 2.

      Adjustment for pre- and post-2009 pandemic only

      L635 - Why discretize the continuous LBI distribution and then use Shannon entropy when you could just use the variance and/or higher moments? (or quantiles)? Similarly, why not use the duration of the peak, rather than Shannon entropy? (though there, because presumably data are already binned weekly, and using duration would involve defining start and stop times, it's more natural than with LBI)

      We realize that we failed to mention in the methods that we calculated the standard deviation of LBI in each season, in addition to the exponential of the Shannon entropy (Shannon diversity) of LBI. Both the Shannon diversity of LBI values and the standard deviation of LBI values were negatively correlated with effective Rt and epidemic intensity and positively correlated with seasonal duration. The two measures were similarly correlated with effective Rt and epidemic intensity (Figure 3 - figure supplements 2 - 3), while the Shannon diversity of LBI had slightly stronger correlations with seasonal duration than s.d. LBI (Figure 5). Thus, both measures of LBI diversity appear to capture potentially biologically important heterogeneities in clade growth rates.

      Separately, we use the inverse Shannon entropy of the incidence distribution to measure the spread of an A(H3N2) epidemic during the season, following the methods of Dalziel et al. 2018. The peak of an epidemic is a single time point at which the maximum incidence occurs. We have not encountered “the duration of the peak” before in epidemiology terminology, and, to our knowledge, there is not a robust way to measure the “duration of a peak,” unless one were to measure the time span between multiple points of maximum incidence or designate an arbitrary threshold for peak incidence that is not strictly the maximum incidence. Given that Shannon entropy is based on the normalized incidence distribution over the course of the entire influenza season (week 40 to week 20), it does not require designating an arbitrary threshold to describe epidemic intensity.

      L642 - again why normalize epidemic intensities, and how sensitive are the results to this? I would imagine given that the RF results were unstable under leave-one-out analysis that some of those results could be quite sensitive to choices of normalization and scaling.

      Epidemic intensity, defined as the inverse Shannon entropy of the incidence distribution, measures the spread of influenza cases across the weeks in a season. Following Dalziel et al. 2018, we estimated epidemic intensity from normalized incidence distributions rather than raw incidences so that epidemic intensity is invariant under differences in reporting rates and/or attack rates across regions and seasons. If we were to use raw incidences instead, HHS regions or seasons could have the appearance of greater or lower epidemic intensity (i.e., incidence concentrated within a few weeks or spread out over several weeks), due to differences in attack rates or test volume, rather than fundamental differences in the shapes of their epidemic curves. In other words, epidemic intensity is intended to measure the shape and spread of an epidemic, regardless of the actual volume of cases in a given region or season.

      In the methods section, we provide further clarification for why epidemic intensities are based on normalized incidence distributions rather than raw incidences.

      Lines 206-209: “Epidemic intensity is intended to measure the shape and spread of an epidemic, regardless of the actual volume of cases in a given region or season. Following the methodology of Dalziel et al. 2018, epidemic intensity values were normalized to fall between 0 and 1 so that epidemic intensity is invariant to differences in reporting rates and/or attack rates across regions and seasons.”  

      L643 - more information about what goes into Epidemia (variables, priors) such that it's replicable/understandable without the code would be good. 

      We now include additional information concerning the epidemic models used to estimate Rt, including all model equations, variables, and priors (Lines 210-276 in Methods).

      L667 did you do breakpoint detection? Why linear models? Was log(incidence) used? 

      In our original submission, we estimated epidemic onsets using piecewise regression models (Lines 666674 in original manuscript), which model non-linear relationships with breakpoints by iteratively fitting linear models (Muggeo, 2003). Piecewise regression falls under the umbrella of parametric methods for breakpoint detection.

      We did not include results from linear models fit to log(incidence) or GLMs with Gaussian error distributions and log links, due to two reasons. First, models fit to log-transformed data require non-zero values as inputs. Although breakpoint detection does not necessarily require weeks of zero incidence leading up to the start of an outbreak, limiting the time period for breakpoint detection to weeks with nonzero incidence (so that we could use log transformed incidence) substantially pushed back previous more biologically plausible estimates of epidemic onset weeks. Second, as an alternative to limiting the dataset to weeks with non-zero incidence, we tried adding a small positive number to weekly incidences so that we could fit models to log transformed incidence for the whole time period spanning epidemic week 40 (the start of the influenza season) to the first week of maximum incidence. Fitting models to log

      transformed incidences produced unrealistic breakpoint locations, potentially because log transformations 1) linearize data, and 2) stabilize variance by reducing the impact of extreme values. Due to the short time span used for breakpoint detection, log transforming incidence diminishes abrupt changes in incidence at the beginning of outbreaks, making it difficult for models to estimate biologically plausible breakpoint locations. Log transformations of incidence may be more useful when analyzing time series spanning multiple seasons, rather than short time spans with sharp changes in incidence (i.e., the exponential growth phase of a single flu outbreak).

      As an alternative to piecewise regression, our revised manuscript also estimates epidemic onsets using a Bayesian ensemble algorithm that accounts for the time series nature of incidence data and allows for complex, non-linear trajectories interspersed with change points (BEAST - a Bayesian estimator of Abrupt change, Seasonal change, and Trend; Zhao et al., 2019). Although a few regional onset time times differed across the two methods, our conclusions did not change concerning correlations between viral fitness and epidemic onset timing.

      We have rewritten the methods section for estimating epidemic onsets to clarify our methodology and to include the BEAST method (Lines 292-308):

      “We estimated the regional onsets of A(H3N2) virus epidemics by detecting breakpoints in A(H3N2) incidence curves at the beginning of each season. The timing of the breakpoint in incidence represents epidemic establishment (i.e., sustained transmission) rather than the timing of influenza introduction or arrival (Charu et al., 2017). We used two methods to estimate epidemic onsets: 1) piecewise regression, which models non-linear relationships with break points by iteratively fitting linear models to each segment (segmented R package) (Muggeo, 2008; Muggeo, 2003), and 2) a Bayesian ensemble algorithm (BEAST – a Bayesian estimator of Abrupt change, Seasonal change, and Trend) that explicitly accounts for the time series nature of incidence data and allows for complex, non-linear trajectories interspersed with change points (Rbeast R package) (Zhao et al., 2019). For each region in each season, we limited the time period of breakpoint detection to epidemic week 40 to the first week of maximum incidence and did not estimate epidemic onsets for regions with insufficient signal, which we defined as fewer than three weeks of consecutive incidence and/or greater than 30% of weeks with missing data. We successfully estimated A(H3N2) onset timing for most seasons, except for three A(H1N1) dominant seasons: 20002001 (0 regions), 2002-2003 (3 regions), and 2009-2010 (0 regions). Estimates of epidemic onset weeks were similar when using piecewise regression versus the BEAST method, and downstream analyses of correlations between viral fitness indicators and onset timing produced equivalent results. We therefore report results from onsets estimated via piecewise regression.”

      L773 national indicators -- presumably this is because you don't have regional-level information, but it might be worth saying that earlier so it doesn't read like there are other indicators now, called national indicators, that we should have heard of 

      In the revised manuscript, we move a paragraph that was at the beginning of the Results to the beginning of the Methods.

      Lines 123-132: 

      “Our study focuses on the impact of A(H3N2) virus evolution on seasonal epidemics from seasons 19971998 to 2018-2019 in the U.S.; whenever possible, we make use of regionally disaggregated indicators and analyses. We start by identifying multiple indicators of influenza evolution each season based on changes in HA and NA. Next, we compile influenza virus subtype-specific incidence time series for U.S. Department of Health and Human Service (HHS) regions and estimate multiple indicators characterizing influenza A(H3N2) epidemic dynamics each season, including epidemic burden, severity, type/subtype dominance, timing, and the age distribution of cases. We then assess univariate relationships between national indicators of evolution and regional epidemic characteristics. Lastly, we use multivariable regression models and random forest models to measure the relative importance of viral evolution, heterosubtypic interference, and prior immunity in predicting regional A(H3N2) epidemic dynamics.”

      In Lines 484-487 in the Methods, we now mention that measures of seasonal antigenic and genetic distance are at the national level. 

      “For each replicate dataset, we estimated national-level genetic and antigenic distances between influenza viruses circulating in consecutive seasons by calculating the mean distance between viruses circulating in the current season 𝑡 and viruses circulating during the prior season (𝑡 – 1 year; one season lag) or two prior seasons ago (𝑡 – 2 years; two season lag).”

      L782 Why Beta regression and what is "the resampled dataset" ? 

      Beta regression is appropriate for models of subtype dominance, epidemic intensity, and age-specific proportions of ILI cases because these data are continuous and restricted to the interval (0, 1) (Ferrari & Cribari-Neto, 2004). “The resampled dataset” refers to the “1000 bootstrap replicates of the original dataset (1000 samples with replacement)” mentioned in Lines 777-778 of the original manuscript. 

      In the revised manuscript, we include more background information about Beta regression models, and explicitly mention that regression models were fit to 1000 bootstrap replicates of the original dataset.

      Lines 503-507: 

      “For subtype dominance, epidemic intensity, and age-specific proportions of ILI cases, we fit Beta regression models with logit links. Beta regression models are appropriate when the variable of interest is continuous and restricted to the interval (0, 1) (Ferrari & Cribari-Neto, 2004). For each epidemic metric, we fit the best-performing regression model to 1000 bootstrap replicates of the original dataset.”

      The github is clear, comprehensive and well-documented, at least at a brief glance. 

      Thank you! At the time of resubmission, our GitHub repository is updated to incorporate feedback from the reviewers.

      References

      Altmann, A., Tolosi, L., Sander, O., & Lengauer, T. (2010). Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10), 1340-1347.

      https://doi.org/10.1093/bioinformatics/btq134  

      Barrat-Charlaix, P., Huddleston, J., Bedford, T., & Neher, R. A. (2021). Limited Predictability of Amino Acid Substitutions in Seasonal Influenza Viruses. Mol Biol Evol, 38(7), 2767-2777.

      https://doi.org/10.1093/molbev/msab065  

      Bedford, T., Riley, S., Barr, I. G., Broor, S., Chadha, M., Cox, N. J., Daniels, R. S., Gunasekaran, C. P.,

      Hurt, A. C., Kelso, A., Klimov, A., Lewis, N. S., Li, X., McCauley, J. W., Odagiri, T., Potdar, V., Rambaut, A., Shu, Y., Skepner, E., . . . Russell, C. A. (2015). Global circulation patterns of seasonal influenza viruses vary with antigenic drift. Nature, 523(7559), 217-220.

      https://doi.org/10.1038/nature14460  

      Chao, A., Gotelli, N. J., Hsieh, T. C., Sander, E. L., Ma, K. H., Colwell, R. K., & Ellison, A. M. (2014). Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecological Monographs, 84(1), 45-67. https://doi.org/10.1890/13-0133.1  Charu, V., Zeger, S., Gog, J., Bjornstad, O. N., Kissler, S., Simonsen, L., Grenfell, B. T., & Viboud, C. (2017). Human mobility and the spatial transmission of influenza in the United States. PLoS

      Comput Biol, 13(2), e1005382. https://doi.org/10.1371/journal.pcbi.1005382  

      Dalziel, B. D., Kissler, S., Gog, J. R., Viboud, C., Bjornstad, O. N., Metcalf, C. J. E., & Grenfell, B. T.

      (2018). Urbanization and humidity shape the intensity of influenza epidemics in U.S. cities.

      Science, 362(6410), 75-79. https://doi.org/10.1126/science.aat6030  

      Debeer, D., & Strobl, C. (2020). Conditional permutation importance revisited. BMC Bioinformatics, 21(1), 307. https://doi.org/10.1186/s12859-020-03622-2  

      Dhanasekaran, V., Sullivan, S., Edwards, K. M., Xie, R., Khvorov, A., Valkenburg, S. A., Cowling, B. J., & Barr, I. G. (2022). Human seasonal influenza under COVID-19 and the potential consequences of influenza lineage elimination. Nat Commun, 13(1), 1721. https://doi.org/10.1038/s41467-02229402-5  

      Ferrari, S., & Cribari-Neto, F. (2004). Beta Regression for Modelling Rates and Proportions. Journal of Applied Statistics, 31(7), 799-815. https://doi.org/10.1080/0266476042000214501  

      Garten, R. J., Davis, C. T., Russell, C. A., Shu, B., Lindstrom, S., Balish, A., Sessions, W. M., Xu, X., Skepner, E., Deyde, V., Okomo-Adhiambo, M., Gubareva, L., Barnes, J., Smith, C. B., Emery, S. L., Hillman, M. J., Rivailler, P., Smagala, J., de Graaf, M., . . . Cox, N. J. (2009). Antigenic and genetic characteristics of swine-origin 2009 A(H1N1) influenza viruses circulating in humans.

      Science, 325(5937), 197-201. https://doi.org/10.1126/science.1176225  

      Grebe, K. M., Yewdell, J. W., & Bennink, J. R. (2008). Heterosubtypic immunity to influenza A virus:

      where do we stand? Microbes Infect, 10(9), 1024-1029.

      https://doi.org/10.1016/j.micinf.2008.07.002  

      Hill, M. O. (1973). Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology, 54(2), 427-432. https://doi.org/https://doi.org/10.2307/1934352  

      Huddleston, J., Barnes, J. R., Rowe, T., Xu, X., Kondor, R., Wentworth, D. E., Whittaker, L., Ermetal, B., Daniels, R. S., McCauley, J. W., Fujisaki, S., Nakamura, K., Kishida, N., Watanabe, S., Hasegawa, H., Barr, I., Subbarao, K., Barrat-Charlaix, P., Neher, R. A., & Bedford, T. (2020).

      Integrating genotypes and phenotypes improves long-term forecasts of seasonal influenza

      A/H3N2 evolution. Elife, 9, e60067. https://doi.org/10.7554/eLife.60067  Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). Springer. 

      Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. Chapman and Hall/CRC. 

      Lee, E. C., Arab, A., Goldlust, S. M., Viboud, C., Grenfell, B. T., & Bansal, S. (2018). Deploying digital health data to optimize influenza surveillance at national and local scales. PLoS Comput Biol,

      14(3), e1006020. https://doi.org/10.1371/journal.pcbi.1006020  

      Lemey, P., Rambaut, A., Bedford, T., Faria, N., Bielejec, F., Baele, G., Russell, C. A., Smith, D. J., Pybus,

      O. G., Brockmann, D., & Suchard, M. A. (2014). Unifying viral genetics and human transportation

      data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog, 10(2), e1003932. https://doi.org/10.1371/journal.ppat.1003932  

      Muggeo, V. (2008). Segmented: An R Package to Fit Regression Models With Broken-Line Relationships. R News, 8, 20-25. 

      Muggeo, V. M. (2003). Estimating regression models with unknown break-points. Stat Med, 22(19), 30553071. https://doi.org/10.1002/sim.1545  

      Neher, R. A., Russell, C. A., & Shraiman, B. I. (2014). Predicting evolution from the shape of genealogical trees. Elife, 3, e03568. https://doi.org/10.7554/eLife.03568  

      Rambaut, A., Pybus, O. G., Nelson, M. I., Viboud, C., Taubenberger, J. K., & Holmes, E. C. (2008). The genomic and epidemiological dynamics of human influenza A virus. Nature, 453(7195), 615-619.

      https://doi.org/10.1038/nature06945  

      Recker, M., Pybus, O. G., Nee, S., & Gupta, S. (2007). The generation of influenza outbreaks by a network of host immune responses against a limited set of antigenic types. Proceedings of the National Academy of Sciences, 104(18), 7711-7716.

      https://doi.org/doi:10.1073/pnas.0702154104  

      Shannon, C. E. (1948). A mathematical theory of communication. The Bell system technical journal, 27(3), 379-423. 

      Smith, G. J., Vijaykrishna, D., Bahl, J., Lycett, S. J., Worobey, M., Pybus, O. G., Ma, S. K., Cheung, C. L., Raghwani, J., Bhatt, S., Peiris, J. S., Guan, Y., & Rambaut, A. (2009). Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature, 459(7250), 1122-1125. https://doi.org/10.1038/nature08182  

      Sridhar, S. (2016). Heterosubtypic T-Cell Immunity to Influenza in Humans: Challenges for Universal TCell Influenza Vaccines. Front Immunol, 7, 195. https://doi.org/10.3389/fimmu.2016.00195  

      Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9, 307. https://doi.org/10.1186/1471-2105-9-307  

      Strobl, C., Boulesteix, A. L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 8, 25.

      https://doi.org/10.1186/1471-2105-8-25  

      Terajima, M., Babon, J. A., Co, M. D., & Ennis, F. A. (2013). Cross-reactive human B cell and T cell epitopes between influenza A and B viruses. Virol J, 10, 244. https://doi.org/10.1186/1743-422x10-244  

      Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M., & Kawaoka, Y. (1992). Evolution and ecology of influenza A viruses. Microbiological Reviews, 56(1), 152-179.

      https://doi.org/doi:10.1128/mr.56.1.152-179.1992  

      Wen, F., Bedford, T., & Cobey, S. (2016). Explaining the geographical origins of seasonal influenza A

      (H3N2). Proc Biol Sci, 283(1838). https://doi.org/10.1098/rspb.2016.1312  

      Yan, L., Neher, R. A., & Shraiman, B. I. (2019). Phylodynamic theory of persistence, extinction and speciation of rapidly adapting pathogens. Elife, 8. https://doi.org/10.7554/eLife.44205  

      Zhao, K., Wulder, M. A., Hu, T., Bright, R., Wu, Q., Qin, H., Li, Y., Toman, E., Mallick, B., Zhang, X., & Brown, M. (2019). Detecting change-point, trend, and seasonality in satellite time series data to track abrupt changes and nonlinear dynamics: A Bayesian ensemble algorithm. Remote Sensing

      of Environment, 232, 111181. https://doi.org/10.1016/j.rse.2019.04.034  

      Zinder, D., Bedford, T., Gupta, S., & Pascual, M. (2013). The Roles of Competition and Mutation in Shaping Antigenic and Genetic Diversity in Influenza. PLOS Pathogens, 9(1).

      https://doi.org/10.1371/journal.ppat.1003104

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer 1

      (Cys25)PTH(1-84) does not show efficacy surpassing that of the previously used rhPTH(1-34). This needs to be discussed biologically and clinically.

      Thank you very much for your valuable comments for enhancing the manuscript. We appreciate your input and have noted that this aspect was not addressed in the discussion. The authors have included the following paragraph in discussion section.

      “This biological difference is thought to be due to dimeric R25CPTH(1-34) exhibiting a more preferential binding affinity for the RG versus R0 PTH1R conformation, despite having a diminished affinity for either conformation. Additionally, the potency of cAMP production in cells was lower for dimeric R25CPTH compared to monomeric R25CPTH, consistent with its lower PTH1R-binding affinity.  (Noh et al., 2024) One of the potential clinical advantages of dimeric R25CPTH(1-34) is its partial agonistic effect in pharmacodynamics. This property may allow for a more fine-tuned regulation of bone metabolism, potentially reducing the risk of adverse effects associated with full agonism, such as hypercalcemia and bone resorption by osteolcast activity. Moreover, the dimeric form may offer a more sustained anabolic response, which could be beneficial in the context of long-term treatment strategies. (Noh et al., 2024) Also, the effects of dimer were prominent, as we mentioned better bone formation than the control group.” (2nd paragraph, Discussion section)

      The terms (Cys25)PTH(1-84) and Dimeric R25CPTH(1-34) are being used interchangeably and incorrectly. A unification of these terms is necessary.

      We totally agree with the reviewer’s notion. R25CPTH(1-84) represents mutated human PTH, rhPTH(1-34) and dimeric R25CPTH(1-34) are synthesized PTH analogs. To clarified the terminology, we thus have changeed the terminology in the manuscript appear in red.

      The figure legend is incorrect. Not all figures are described, and even though there are figures from A to I, only up to E is explained, or the content is different.

      We apologize for our negligence. As suggested by a reviewer, we've fixed the figure legends throughout before the list of references in the manuscript as follows.

      “Figure legends

      Figure 1. Micro-CT analysis (A-D) Experimental design for the controlled delivery of rhPTH(1-34) and dimeric R25CPTH(1-34) in ovariectomized beagle model. Representative images for injection and placement of titanium implant. (E) Micro-CT analysis. bone mineral density (BMD), bone volume (TV; mm3), trabecular number (Tb.N; 1/mm), trabecular thickness (Tb. Th; um), trabecular separation (Tb.sp; ㎛). Error bars indicate standard deviation. Data are shown as mean ± s.d. *p<0.05, **p<0.01, ***p<0.001, n.s., not significant.  P, posterior. R, right

      Figure 2. (A-I) Histological analysis of the different groups stained in Goldner’s trichrome. The presence of bone is marked by the green color and soft tissue in red. Red arrows indicate the position with soft tissues without bone around the implant threads. The area of bone formed was the widest in the rhPTH(1-34)-treated group. In the dimeric R25CPTH(1-34)treated group, there is a greater amount of bone than vehicle-treated group. Green arrows represent the bone formed over the implant. blue dotted line, margin of bone and soft tissue; Scale bars: 1mm

      Figure 3. Histological analysis using Masson trichrome staining results in the rhPTH(1-34) and dimeric R25CPTH(1-34)-treated group (A-L) Masson trichrome-stained sections of cancellous bone in the mandibular bone. The formed bone is marked by the color red. Collagen is stained blue. Black dotted box magnification region of trabecular bone in the mandible. Scale bars, A-C, G-I: 1mm; D-F, J-L: 200 ㎛

      Figure 4. Immunohistochemical analysis using TRAP staining for bone remodeling activity (A-L) TRAP staining is used to evaluate bone remodeling by staining osteoclasts. Osteoclasts is presented by the purple color. Black dotted box magnification region of trabecular bone in the mandible. (M, N) The number of TRAP-positive cells in the mandible of the rhPTH(1-34) and dimeric R25CPTH(1-34)-treated beagles. Scale bars, A-C, G-I: 1mm; D-F, J-L: 200 ㎛. Error bars indicate standard deviation. Data are shown as mean ± s.d. *p<0.05, **p<0.01, n.s., not significant

      Figure 5. Measurement of biochemical Marker Dynamics in serum. The serum levels of calcium, phosphorus, P1NP, and CTX across three time points (T0, T1, T2) following treatment with dimeric dimeric R25CPTH(1-34), rhPTH(1-34), or control. (A-B) Calcium and phosphorus levels exhibit an upward trend in response to both PTH treatments compared to control, suggesting enhanced bone mineralization. (C) P1NP levels, indicative of bone formation, remain relatively unchanged across time and treatments. (D) CTX levels, associated with bone resorption, show no significant differences between groups. Data points for the dimeric R25CPTH(1-34), rhPTH(1-34), and control are marked by squares, circles, and triangles, respectively, with error bars representing confidence intervals.

      Supplementary Figure. Three-dimensional reconstructed image of the bone surrounding the implants. Three-dimensional reconstructed images of the peri-implant bone depicting the osseointegration after different therapeutic interventions. (A) Represents the bone response to recombinant human parathyroid hormone fragment (rhPTH 1-34) treatment, showing the most robust degree of bone formation around the implant in the three groups. (B) Shows the bone response to a modified PTH fragment (dimeric R25CPTH(1-34)), indicating a similar level of bone growth and integration as seen with rhPTH(1-34), although to a slightly lesser extent. (C) Serves as the control group, demonstrating the least amount of bone formation and osseointegration. The upper panel provides a top view of the bone-implant interface, while the lower panel offers a cross-sectional view highlighting the extent of bony ingrowth and integration with the implant surface.”

      In Figure 5, although the descriptions of T0, T1, T2 are mentioned in the method section, it would be more clear if there was a timeline like in Figure 1.

      Based on the reviewer’s advice, we have indicated the timing of T0, T1, and T2 in the materials & methods section describing the serum biochemical assay, and we have shown a timeline in figure 5.

      In Figure 5, instead of having calcium, phosphorus, P1NP, CTX graphs all under Figure 5, it would be more convenient for referencing in the text to label them as Figure 5A, Figure 5B, Figure 5C, Figure 5D.

      We totally understood the reviewer’s comment. As the reviewer’s suggested, we have corrected the labeling in the text for figure 5 as follows.

      “The levels of calcium, phosphorus, CTX, and P1NP were analyzed over time using RM-ANOVA (Figure 5). There were no significant differences between the groups for calcium and phosphorus at time points T0 and T1 (Figure 5A). However, after the PTH analog was administered at T2 (Figure 5A), the levels were highest in the rhPTH(1-34) group, followed by the dimeric R25CPTH(1-34) group, and then, lowest in the control group, which was statistically significant (Figure 5B,C). (P < 0.05) The differences between the groups over time for CTX and P1NP were not statistically significant (Figure 5D, E).”

      Significance should be indicated in the figure (no asterisk present).

      As the reviewer’s comment, we put the asterisk in the figure 5.

      Addition of Figures in Text:

      Line 112: change from "figure 2" to "figure 1" / Line 115: mention "figure 1. E"

      Line 120: refer to "figure 1. E" / Line 123: change from "figure 3" to "figure 2"

      Line 128: refer to "figure 2.A-C" / Line 137: mention "figure 3"

      Line 138: refer to "figure 3. A-L" / Line 143: mention "figure 3. A-L"

      Line 144: refer to "figure 3. E,F,K,L" / Line 148: mention "figure 4"

      Line 150: refer to "figure 4 M,N" / Line 152: mention "figure 4. M,N"

      Line 155: refer to "figure 5" / Line 157: mention "figure 5"

      Line 159: refer to "figure 5" / Line 171: mention "figure 1 E"

      Line 175: refer to "figure 2 M, N"/ Line 194: mention "figure 3"

      Above all, thank you for the reviewer’s notion. We corrected detailed figure labeling in text to red color.

      Response to Reviewer 2

      First, the authors should clarify why they compared the effects of rhPTH(1-34) and of dimeric R25C2 PTH(1-34)? In most of the parameters, rhPTH(1-34) seems to be superior to dimeric R25C2 PTH(1-34). Why did the authors insist that the anabolic effects of dimer were prominent? Even though implication of dimeric R25C2 PTH(1-34) was drawn from genetic mutation studies, the authors should describe more clearly in the discussion the potential clinical benefits of the dimeric R25C2 PTH(1-34) compared to rhPTH(1-34), especially if dimeric R25C2 PTH(1-34) has just partial agonistic effect in pharmacodynamics.

      Thank you for your insightful comments and questions regarding our results between rhPTH(1-34) and dimeric R25CPTH(1-34). rhPTH(1-34) is a well-characterized therapy for osteoporosis. In this study, rhPTH(1-34) generally showed superior outcomes in most parameters tested, the dimeric R25CPTH(1-34) exhibited specific anabolic effects that are not as pronounced with rhPTH(1-34). We recognized R25CPTH(1-34) as a anabolic effector. One of the potential advantages of dimeric R25CPTH(1-34) is its partial agonistic effect in pharmacodynamics. This property may allow for a more fine-tuned regulation of bone metabolism, potentially reducing the risk of adverse effects associated with full agonism, such as hypercalcemia and bone resorption by osteolast activity. Moreover, the dimeric form may offer a more sustained anabolic response, which could be beneficial in the context of long-term treatment strategies. Also, based on our results, we notes that the effects of dimer were prominent, as we mentioned better bone formation than the control group. We appreciate your input and have noted that this aspect was not addressed in the discussion. As a result, we have included the following paragraph in discussion section.

      “This biological difference is thought to be due to dimeric R25CPTH(1-34) exhibiting a more preferential binding affinity for the RG versus R0 PTH1R conformation, despite having a diminished affinity for either conformation. Additionally, the potency of cAMP production in cells was lower for dimeric R25CPTH compared to monomeric R25CPTH, consistent with its lower PTH1R-binding affinity.  (Noh et al., 2024) One of the potential clinical advantages of dimeric R25CPTH(1-34) is its partial agonistic effect in pharmacodynamics. This property may allow for a more fine-tuned regulation of bone metabolism, potentially reducing the risk of adverse effects associated with full agonism, such as hypercalcemia and bone resorption by osteolcast activity. Moreover, the dimeric form may offer a more sustained anabolic response, which could be beneficial in the context of long-term treatment strategies. (Noh et al., 2024) Also, the effects of dimer were prominent, as we mentioned better bone formation than the control group.” (2nd paragraph, Discussion section)

      Second, please describe the intermittent and continuous application of PTH analogues. Many of the readers may misunderstand that the authors' daily injection of PTHs were actually to mimic the clinical intermittent application or continuous one. Incorporation of the author's intention for experimental design would be more helpful for readers.

      Thank you for your insightful comments regarding the need for clearer differentiation between intermittent and continuous applications of PTH analogs in this study. We appreciate your concern that the readers may not fully grasp whether our daily injection protocol was intended to mimic clinical intermittent or continuous PTH administration. To address this, we have revised the manuscript to explicitly clarify that the daily injections of rhPTH(1-34) and dimeric R25CPTH(1-34) were designed to simulate the intermittent dosing regimen commonly used in clinical practice. This regimen is known to maximize the anabolic effects on bone while minimizing potential catabolic actions associated with more frequent or continuous hormone exposure. We have added detailed explanations in the Introduction, Methods, and Discussion sections to help readers understand our experimental design and its relevance to clinical settings.

      Introduction section

      “Administration of prathyroid hormone (PTH) analogs can be categorized into two distinct protocols: intermittent and continuous. Intermittent rhPTH(1-34) therapy, typically characterized by daily injections, is clinically used to enhance bone formation and strength. This method leverages the anabolic effects of rhPTH(1-34) without significant bone resorption, which can occur with more frequent or continuous exposure. On the other hand, continuous rhPTH(1-34) exposure, often modeled in research as constant infusion, tends to accelerate bone resorption activities, potentially leading to bone loss (Silva and Bilezikian, 2015; Jilka, 2007). Understanding these differences is crucial for interpreting the therapeutic implications of rhPTH(1-34) in bone health.”

      Silva, B. C., & Bilezikian, J. P. (2015). Parathyroid hormone: anabolic and catabolic actions on the skeleton. Current Opinion in Pharmacology, 22, 41-50.

      Jilka, R. L. (2007). Molecular and cellular mechanisms of the anabolic effect of intermittent PTH. Bone, 40(6), 1434-1446.

      Materials and Methods section

      “Each animal received one injection per day, aimed at replicating the intermittent rhPTH(1-34) exposure proven beneficial for bone regeneration and overall skeletal health in clinical settings (Neer et al., 2001; Kendler et al., 2018). This regimen was chosen to investigate the potential anabolic effects of these specific PTH analogs under conditions closely resembling therapeutic use.”

      Neer, R. M., Arnaud, C. D., Zanchetta, J. R., Prince, R., Gaich, G. A., Reginster, J. Y., Hodsman, A. B., Eriksen, E. F., Ish-Shalom, S., Genant, H. K., Wang, O., and Mitlak, B. H. (2001). Effect of Parathyroid Hormone (1-34) on Fractures and Bone Mineral Density in Postmenopausal Women with Osteoporosis. The New England Journal of Medicine, 344(19), 1434-1441.

      Kendler, D. L., Marin, F., Zerbini, C. A. F., Russo, L. A., Greenspan, S. L., Zikan, V., Bagur, A., Malouf-Sierra, J., Lakatos, P., Fahrleitner-Pammer, A., Lespessailles, E., Minisola, S., Body, J. J., Geusens, P., Moricke, R., & Lopez-Romero, P. (2018). Effects of Teriparatide and Risedronate on New Fractures in Post-Menopausal Women with Severe Osteoporosis (VERO): A Multicenter, Double-Blind, Double-Dummy, Randomized Controlled Trial. The Lancet, 391(10117), 230-240.

      Discussion section

      “The use of daily injections in this study was intended to simulate intermittent PTH therapy, a well-established clinical approach for managing osteoporosis and enhancing bone regeneration. Intermittent administration of PTH, as opposed to continuous exposure, is critical for maximizing the anabolic response while minimizing the catabolic effects that are associated with higher frequency or continuous hormone levels. Our findings support the notion that even with daily administration, both rhPTH(1-34) and dimeric dimeric R25CPTH(1-34) promote bone formation and osseointegration, consistent with the outcomes expected from intermittent therapy. It’s important for future research to consider the dosage and timing of administration to further optimize the therapeutic benefits of PTH analogs (Dempster et al., 2001; Hodsman et al., 2005).”

      Dempster, D. W., Cosman, F., Kurland, E. S., Zhou, H., Nieves, J., Woelfert, L., Shane, E., Plavetic, K., Müller, R., Bilezikian, J., & Lindsay, R. (2001). Effects of Daily Treatment with Parathyroid Hormone on Bone Microarchitecture and Turnover in Patients with Osteoporosis: A Paired Biopsy Study. Journal of Bone and Mineral Research, 16(10), 1846-1853.

      Hodsman, A. B., Bauer, D. C., Dempster, D. W., Dian, L., Hanley, D. A., Harris, S. T., Kendler, D. L., McClung, M. R., Miller, P. D., Olszynski, W. P., Orwoll, E., Yuen, C. K. (2005). Parathyroid Hormone and Teriparatide for the Treatment of Osteoporosis: A Review of the Evidence and Suggested Guidelines for Its Use. Endocrine Reviews, 26(5), 688-703.

      Third, please unify the nomenclature. Ensure consistency in the nomenclature throughout the article. Unify the naming conventions for PTH analogues, such as rhPTH(1-34) vs teriparatide and (Cys25)PTH(1-84) vs R25CPTH(1-34) vs R25CPTH(1-34) vs (1-84). Choose one nomenclature for each analogue and use it consistently throughout the article.

      We totally agree with the reviewer’s notion. R25CPTH(1-84) represents mutated human PTH, rhPTH(1-34) and dimeric R25CPTH(1-34) are synthesized PTH analogs. To clarified the terminology, we thus have changed the terminology in the manuscript appear in red.

      Response to Reviewer 3

      I would recommend to rewrite the manuscript in a form that is more understandable to the readers. In fact, it appears to me that this work was originally formatted in a way that would need the Materials and Methods to precede the results. As presented (and as requested by the eLife formatting) the Materials and Methods are available only at the end of the reading and, as a consequence, the readers needs to refer to the Materials and Methods to have a general and initial understanding of the study design (i.e. type of treatment for each group, etc are not well specified in the Results section).

      Thank you for you constructive comments and suggestions regarding the manuscript. We appreciate your feedback on the organization of the manuscript entirely. As reviewer mentioned, Materials and methods were placed after the discussion section in accordance with the format of the elife journal. For a better and initial understanding, a description of each experimental group has been added to the Results section as follow. Thank you again for your valuable comments.

      “To investigate evaluating and comparing the efficacy of rhPTH(1-34) and the dimeric R25CPTH(1-34) in promoting bone regeneration and healing in a clinically relevant animal model. In our study, beagle dogs were selected as the model due to their anatomical similarity to human oral structures, suitable size for surgeries, human-like bone turnover rates, and established oral health profiles, ensuring comparable and ethically sound research outcomes. The normal saline injected-control group, injected with 40ug/day PTH (Forsteo, Eli Lilly) group, and 40ug/day PTH analog-injected group. Animals in each group were injected subcutaneously for 10 weeks.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents valuable information about the specificity and promiscuity of toxic effector and immunity protein pairs. The evidence supporting the claims of the authors is currently incomplete, as there is concern about the methodology used to analyze protein interactions, which did not take potential differences in expression levels, protein folding, and/or transient interaction into account. Other methods to measure the strength of interactions and structural predictions would improve the study. The work will be of interest to microbiologists and biochemists working with toxin-antitoxin and effector-immunity proteins.

      We thank the reviewers for considering this manuscript. We agree that this manuscript provides a valuable and cross-discipline introduction to new EI pair protein families where we focus on the EI pair’s flexibility and impacts on community structure. As such, we believe we have provided a solid foundation for future studies to examine non-cognate interactions and their possible effects on microbial communities. This, by definition, leaves some areas “incomplete” and, therefore, open for further investigations. While the methods we show do consider potential differences in binding assays, we have more explicitly addressed how “expression, protein folding, and/or transient binding” may play into this expanded EI pair model. We have also tempered the discussion of the proposed model, while also clearly highlighting other published evidence of non-cognate binding interactions between effector and immunity proteins. We have responded to the reviewers’ public comments (italicized below). 

      In this revised manuscript, we have updated the main text, particularly the Discussion section, to include more careful language, explain past research better, and add new references to works showing non-cognate immunity proteins protecting against effectors in other systems. We have also updated the supplemental files with more analyses; the relevant procedures are in the Materials and Methods.

      Public Reviews:

      Note: Reviewer 1, who appeared to focus on a subset of the manuscript rather than the whole, based their comments on several inaccuracies, which we discuss below. We found the tone in this reviewer's comments to be, at times, inappropriate, e.g., using "harsh" and "simply too drastic" to imply that common structure-function analyses were outside of the field-standard methods. We also note that the reviewer took a somewhat atypical step in reviewing this manuscript by running and analyzing the potential protein-complex data in AlphaFold2 but did not discuss areas of low confidence within that model that may contradict their conclusions. We are concerned their approach muddled valid scientific criticisms with problematic conclusions.

      Reviewer #1 (Public Review):

      In this manuscript, Knecht, Sirias et al describe toxin-immunity pair from Proteus mirabilis. Their observations suggest that the immunity protein could protect against non-cognate effectors from the same family. They analyze these proteins by dissecting them into domains and constructing chimeras which leads them to the conclusion that the immunity can be promiscuous and that the binding of immunity is insufficient for protective activity.

      Strengths:<br />  The manuscript is well written and the data are very well presented and could be potentially interesting. The phylogenetic analysis is well done, and provides some general insights.

      Weaknesses:<br /> (1) Conclusions are mostly supported by harsh deletions and double hybrid assays. The later assays might show binding, but this method is not resolutive enough to report the binding strength. Proteins could still bind, but the binding might be weaker, transient, and out-competed by the target binding.

      The phrasing of structure-function analyses as “harsh” is a bit unusual, as other research groups regularly use deletions and hybrid studies. Given the known caveats to deletion and domain substitutions, we included point-mutation analyses for both the effector and immunity proteins, as found on lines 105 - 113 and 255 - 261 in the current manuscript. These caveats are also why we coupled the in vitro binding analyses with in vivo protection experiments in two distinct experimental systems (E. coli and P. mirabilis). Based on this manuscript’s introductory analysis (where we define and characterize the genes, proteins, interactions, phylogenetics, and incidences in human microbiomes), the next apparent questions are beyond the scope of this study. Future approaches would include analyzing purified proteins from the effector (E) and immunity (I) protein families using biochemical assays, such as X-ray crystallography, circular dichroism spectroscopy, among others. 

      Interestingly, most papers in the EI field do not measure EI protein affinity (Jana et al., 2019, Yadav et al., 2021). Notable exceptions are earlier colicin research (Wallis et al., 1995) and a new T6SS EI paper (Bosch et al., 2023) published as we first submitted this manuscript.

      (2) While the authors have modeled the structure of toxin and immunity, the toxin-immunity complex model is missing. Such a model allows alternative, more realistic interpretation of the presented data. Firstly, the immunity protein is predicted to bind contributing to the surface all over the sequence, except the last two alpha helices (very high confidence model, iPTM>0.8). The N terminus described by the authors contributes one of the toxin-binding surfaces, but this is not the sole binding site. Most importantly, other parts of the immunity protein are predicted to interact closer to the active site (D-E-K residues). Thus, based on the AlphaFold model, the predicted mechanism of immunization remains physically blocking the active site. However, removing the N terminal part, which contributes large interaction surface will directly impact the binding strength. Hence, the toxin-immunity co-folding model suggests that proper binding of immunity, contributed by different parts of the protein, is required to stabilize the toxin-immunity complex and to achieve complete neutralization. Alternative mechanisms of neutralization might not be necessary in this case and are difficult to imagine for a DNase.

      In response to the reviewer’s comment, we again reviewed the RdnE-RdnI AlphaFold2 complex predictions with the most updated version of ColabFold (1.5.2-patch with PDB100 and MMseq2) and have included them at the end of these responses [1].

      However, the literature reports that computational predictions of E-I complexes often do not match experimental structural results (Hespanhol et al., 2022, Bosch et al., 2023). As such, we chose not to include the predicted cognate and non-cognate RdnE-I complexes from ColabFold (which uses AlphaFold2) and have not included this data in the revised manuscript. (It is notable that reviewer 1 found the proposed expanded model and research so interesting as to directly input and examine the AI-predicted RdnE-RdnI protein interactions in AlphaFold2.)

      Discussion of the prevailing toxin-immunity complex model is in the introduction (lines 45-48) and Figure 5E. Further, there are various known mechanisms for neutralizing nucleases and other T6SS effectors, which we briefly state in the discussion (lines 359 - 361). More in-depth, these molecular mechanisms include active-site blocking (Benz et al., 2012), allosteric-site binding (Kleanthous et al., 1999 and Lu et al., 2014), enzymatic neutralization of the target (Ting et al., 2021), and structural disruption of both the active and binding sites (Bosch et al., 2023). Given this diversity of mechanisms, we did not presume to speculate on the as-of-yet unknown mechanism of RdnI protection. We have expanded discussion of these items in the revised manuscript.

      (3) Dissection of a toxin into two domains is also not justified from a structural point of view, it is probably based on initial sequence analyses. The N terminus (actually previously reported as Pone domain in ref 21) is actually not a separate domain, but an integral part of the protein that is encased from both sides by the C terminal part. These parts might indeed evolve faster since they are located further from the active site and the central core of the protein. I am happy to see that the chimeric toxins are active, but regarding the conservation and neutralization, I am not surprised, that the central core of the protein fold is highly conserved. However, "deletion 2" is quite irrelevant - it deletes the central core of the protein, which is simply too drastic to draw any conclusions from such a construct - it will not fold into anything similar to an original protein, if it will fold properly at all.

      The reviewer’s comment highlights why we turned to the chimera proteins to dissect the regions of RdnE (formerly IdrD-CT), as the deletions could result in misfolded proteins. (We initially examined RdnE in the years before the launch of AlphaFold2.) However, the reviewer is incorrect regarding the N-terminus of RdnE. The PoNe domain, while also a subfamily of the PD-(D/E)XK superfamily, forms a distinct clade of effectors from the PD-(D/E)XK domain in RdnE (formally IdrD-CT) as seen in Hespanhol et al., 2022; this is true for other DNase effectors as well. Many studies analyzing effectors within the PD-(D/E)XK superfamily only focus on the PD-(D/E)XK domain, removing just this domain from the context of the whole protein (Hespanhol et al., 2022; Jana et al., 2019). Of note, in RdnE, this region alone (containing the DNA-binding domain) is insufficient for DNase activity (unlike in PoNe). We have clarified this distinction in the results section of the current manuscript, visible in figure 2 .

      (4) Regarding the "promiscuity" there is always a limit to how similar proteins are, hence when cross-neutralization is claimed authors should always provide sequence similarities. This similarity could also be further compared in terms of the predicted interaction surface between toxin and immunity.

      Reviewer 1 points out a fundamental property of protein-protein interactions that has been isolated away from the impacts of such interactions on bacterial community structure. We have provided the whole protein alignments in figure 3 supplemental figure 3, the summary images in Figure 3D, and the protein phylogenetic trees in Figure 3C. We encourage others to consider the protein alignments as percent amino acid sequence similarity is not necessarily a good gauge for protein function and interactions. These data are publicly available on the OSF website associated with this manuscript https://osf.io/scb7z/, and we hope the community explores the data there.

      In consideration of the enthusiasm to deeply dive into the primary research data, we have included the pairwise sequence identities across the entire proteins here: Proteus RdnI vs. Rothia RdnI: 23.6%; Proteus RdnI vs. Prevotella RdnI: 16.3%, Proteus RdnI vs. Pseudomonas RdnI: 14.6%; Rothia RdnI vs. Prevotella RdnI: 22.4%, Rothia RdnI vs. Pseudomonas RdnI: 17.6%; Prevotella RdnI vs. Pseudomonas RdnI: 19.5%. (As stated in response to reviewer 1 comment 2, we did not find it appropriate to make inferences based on AlphaFold2-predicted protein complexes.)

      Overall, it looks more like a regular toxin-immunity couple, where some cross-reactions with homologues are possible, depending on how far the sequences have deviated. Nevertheless, taking all of the above into account, these results do not challenge toxin-immunity specificity dogma.

      In this manuscript, we did not intend to dismiss the E-I specificity model but rather point out its limitations and propose an important expansion of that model that accounts for cross-protection and survival against attacks from other genera. We agree that it is commonly considered that deviations in amino acid sequence over time could result in cross-binding and protection (see lines 364-368). However, the impacts of such cross-binding on community structure, bacterial survival, and strain evolution were rarely addressed in prior literature, with exceptions such as in Zhang et al., 2013 and Bosch et al., 2023 among others. One key insight we propose and show in this manuscript is that cross-binding can be a fitness benefit in mixed communities; therefore, it could be selected for evolutionarily (lines 378-380), even potentially in host microbiomes.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Knecht et al entitled "Non-cognate immunity proteins provide broader defenses against interbacterial effectors in microbial communities" aims at characterizing a new type VI secretion system (T6SS) effector immunity pair using genetic and biochemical studies primarily focused on Proteus mirabilis and metagenomic analysis of human-derived data focused on Rothia and Prevotella sequences. The authors provide evidence that RdnE and RdnI of Proteus constitute an E-I pair and that the effector likely degrades nucleic acids. Further, they provide evidence that expression of non-cognate immunity derived from diverse species can provide protection against RdnE intoxication. Overall, this general line of investigation is underdeveloped in the T6SS field and conceptually appropriate for a broad audience journal. The paper is well-written and, aside from a few cases, well-cited. As detailed below however, there are several aspects of this paper where the evidence provided is somewhat insufficient to support the claims. Further, there are now at least two examples in the literature of non-cognate immunity providing protection against intoxication, one of which is not cited here (Bosch et al PMID 37345922 - the other being Ting et al 2018). In general therefore I think that the motivating concept here in this paper of overturning the predominant model of interbacterial effector-immunity cognate interactions is oversold and should be dialed back.

      We agree that analyses focusing on flexible non-cognate interactions and protection are underdeveloped within the T6SS field and are not fully explored within a community structure. These ideas are rapidly growing in the field, as evidenced by the references provided by the reviewer. As stated earlier, we did not intend to overturn the prevailing model but rather have proposed an expanded model that accounts for protection against attacks from foreign genera.

      Strengths:

      One of the major strengths of this paper is the combination of diverse techniques including competition assays, biochemistry, and metagenomics surveys. The metagenomic analysis in particular has great potential for understanding T6SS biology in natural communities. Finally, it is clear that much new biology remains to be discovered in the realm of T6SS effectors and immunity.

      Weaknesses:

      The authors have not formally shown that RdnE is delivered by the T6SS. Is it the case that there are not available genetics tools for gene deletion for the BB2000 strain? If there are genetic tools available, standard assays to demonstrate T6SS-dependency would be to interrogate function via inactivation of the T6SS (e.g. by deleting tssC).

      Our research group showed that the T6SS secretes RdnE (previously IdrD) in Wenren et al., 2013 (cited in lines 71-73). We later confirmed T6SS-dependent secretion by LC-MS/MS (Saak et al., 2017).  

      For swarm cross-phyla competition assays (Figure 4), at what level compared to cognate immunity are the non-cognate immunity proteins being expressed? This is unclear from the methods and Figure 4 legend and should be elaborated upon. Presumably these non-cognate immunity proteins are being overexpressed. Expression level and effector-to-immunity protein stoichiometry likely matters for interpretation of function, both in vitro as well as in relevant settings in nature. It is important to assess if native expression levels of non-cognate cross-phyla immunity (e.g. Rothia and Prevotella) protect similarly as the endogenously produced cognate immunity. This experiment could be performed in several ways, for example by deleting the RdnE-I pair and complementing back the Rothia or Prevotella RdnI at the same chromosomal locus, then performing the swarm assay. Alternatively, if there are inducible expression systems available for Proteus, examination of protection under varying levels of immunity induction could be an alternate way to address this question. Western blot analysis comparing cognate to non-cognate immunity protein levels expressed in Proteus could also be important. If the authors were interested in deriving physical binding constants between E and various cognate and non-cognate I (e.g. through isothermal titration calorimetry) that would be a strong set of data to support the claims made. The co-IP data presented in supplemental Figure 6 are nice but are from E. coli cells overexpressing each protein and do not fully address the question of in vivo (in Proteus) native expression.

      P. mirabilis strain ATCC29906 does not encode the rdnE and rdnI genes on the chromosome (NCBI BioSample: SAMN00001486) (line 151). Production of the RdnI proteins, including the cognate Proteus RdnI, comes from equivalent transgenic expression vectors. Specifically, the rdnI genes were expressed under the flaA promoter in P. mirabilis strain ATCC29906 (Table 1) for the swarm competition assays found in Figure 2C and Figure 4. This promoter results in constitutive expression in swarming cells (Belas et al., 1991; Jansen et al., 2003). In the revised manuscript, figure 4 Supplement Figure 2 shows the relative RdnI protein levels in these strains; we also clarified the expression constructs in the text (see reviewer 3, comment 1).

      Lines 321-324, the authors infer differences between E and I in terms of read recruitment (greater abundance of I) to indicate the presence of orphan immunity genes in metagenomic samples (Figure 5A-D). It seems equally or perhaps more likely that there is substantial sequence divergence in E compared to the reference sequence. In fact, metagenomes analyzed were required only to have "half of the bases on reference E-I sequence receiving coverage". Variation in coverage again could reflect divergent sequence dipping below 90% identity cutoff. I recommend performing metagenomic assemblies on these samples to assess and curate the E-I sequences present in each sample and then recalculating coverage based on the exact inferred sequences from each sample.

      This comment raises the challenges with metagenomic analyses. It was difficult to balance specificity to a particular species’ DNA sequence with the prevalence of any homologous sequence in the sample. Given the distinction in binding interactions among the examined four species, we opted to prioritize specificity, accepting that we were losing access to some rdnE and rdnI sequences in that decision. We chose a 90% identity cutoff, which, through several in silica controls, ensured that each sequence we identified was the rdnE or rdnI gene from that specific species. For the Version of Record, we have included analysis with a 70% cutoff in the supplemental information to try to account for sequence divergence by lowering the identity cutoffs as suggested. The data from the 70% identity cutoff was consistent with the original data from the 90% identity cutoff.

      A description of gene-level read recruitment in the methods section relating to metagenomic analysis is lacking and should be provided.

      Noted. We included the raw code and sequences on the OSF website associated with this manuscript https://osf.io/scb7z/.

      Reviewer #3 (Public Review):

      Summary:<br /> The authors discovered that the RdnE effector possesses DNase activity, and in competition, P. mirabilis having RdnE outcompetes the null strain. Additionally, they presented evidence that the RdnI immunity protein binds to RdnE, suppressing its toxicity. Interestingly, the authors demonstrated that the RdnI homolog from a different phylum (i.e., Actinomycetota) provides cross-species protection against RdnE injected from P. mirabilis, despite the limited identity between the immunity sequences. Finally, using metagenomic data from human-associated microbiomes, the authors provided bioinformatic evidence that the rdnE/rdnI gene pair is widespread and present in individual microbiomes. Overall, the discovery of broad protection by non-cognate immunity is intriguing, although not necessarily surprising in retrospect, considering the prolonged period during which Earth was a microbial battlefield/paradise.

      Strengths:<br /> The authors presented a strong rationale in the manuscript and characterized the molecular mechanism of the RdnE effector both in vitro and in the heterologous expression model. The utilization of the bacterial two-hybrid system, along with the competition assays, to study the protective action of RdnI immunity is informative. Furthermore, the authors conducted bioinformatic analyses throughout the manuscript, examining the primary sequence, predicted structural, and metagenomic levels, which significantly underscore the significance and importance of the EI pair. 

      Weaknesses:<br /> (1) The interaction between RdnI and RdnE appears to be complex and requires further investigation. The manuscript's data does not conclusively explain how RdnI provides a "promiscuous" immunity function, particularly concerning the RdnI mutant/chimera derivatives. The lack of protection observed in these cases might be attributed to other factors, such as a decrease in protein expression levels or misfolding of the proteins. Additionally, the transient nature of the binding interaction could be insufficient to offer effective defenses.

      Yes, we agree with the reviewer and hope that grant reviewers’ share this colleague’s enthusiasm for understanding the detailed molecular mechanisms of RdnE-RdnI binding across genera. In the revised manuscript, we have continued to emphasize such caveats as the next frontier is clearly understanding the molecular mechanisms for RdnI cognate or non-cognate protection. In the revised manuscript, figure 4 Supplement Figure 2 shows the RdnI protein levels; we also clarified the expression constructs in the text (see reviewer 2, comment 2).

      (2) The results from the mixed population competition lack quantitative analysis. The swarm competition assays only yield binary outcomes (Yes or No), limiting the ability to obtain more detailed insights from the data.

      The mixed swam assay is needed when studying T6SS effectors that are primarily secreted during Proteus’ swarming activity (Saak et al. 2017, Zepeda-Rivera et al. 2018). This limitation is one reason we utilize in vitro, in vivo, and bioinformatic analyses. Though the swarm competition assay yields a binary outcome, we are confident that the observed RdnI protection is due to interaction with a trans-cell RdnE via an active T6SS. By contrast, many manuscripts report co-expression of the EI pair (Yadev et al., 2021, Hespanhol et al., 2022) rather than secreted effectors, as we have achieved in this manuscript.

      (3) The discovery of cross-species protection is solely evident in the heterologous expression-competition model. It remains uncertain whether this is an isolated occurrence or a common characteristic of RdnI immunity proteins across various scenarios. Further investigations are necessary to determine the generality of this behavior.

      We agree, which is why we submitted this paper as a launching point for further investigations into the generality of non-cognate interactions and their potential impact on community structure.

      Comments from Reviewing Editor:<br />  - In addition to the references provided by reviewer#2, the first manuscript to show non-cognate binding of immunity proteins was Russell et al 2012 (PMID: 22607806).<br />  - IdrD was shown to form a subfamily of effectors in this manuscript by Hespanhol et al 2022 PMID: 36226828 that analyzed several T6SS effectors belonging to PDDExK, and it should be cited.

      We appreciate that the reviewer and eLife staff pointed out missed citations. We have incorporated these studies and cited them in the revised manuscript.

      [1] The Proteus RdnE in complex with either the Prevotella or Pseudomonas RdnI showed low confidence at the interface (pIDDT ~50-70%); this AI-predicted complex might support the lack of binding seen in the bacterial two-hybrid assay. On the other hand, the Proteus and Rothia RdnI N-terminal regions show higher confidence at the interface with RdnE. Despite this, the C-terminus of the Proteus RdnI shows especially low confidence (pIDDT ~50%) where it might interact near RdnE’s active site (as suggested by reviewer 1). Given this low confidence and the already stated inaccuracies of AI-generated complexes, we would rather wait for crystallization data to inform potential protection mechanisms of RdnI.

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this fundamental study, the authors use innovative fine-scale motion capture technologies to study visual vigilance with high-acuity vision, to estimate the visual fixation of free-feeding pigeons. The authors present convincing evidence for use of the fovea to inspect predator cues, the behavioral state influencing the latency for fovea use, and the use of the fovea decreasing the latency to escape of both the focal individual and other flock members. The work will be of broad interest to behavioral ecologists.

      We thank the editor for his interest and feedback on the manuscript. We hereafter addressed the comments of the reviewer.

      Reviewer #1 (Public Review):

      Summary:

      The authors were using an innovative technic to study the visual vigilance based on high-acuity vision, the fovea. Combining motion-capture features and visual space around the head, the authors were able to estimate the visual fixation of free-feeding pigeon at any moment. Simulating predator attacks on screens, they showed that 1) pigeons used their fovea to inspect predators cues, 2) the behavioural state (feeding or head-up) influenced the latency to use the fovea and 3) the use of the fovea decrease the latency to escape of both the individual that foveate the predators cues but also the other flock members.

      Strengths:

      The paper is very interesting, and combines innovative technic well adapted to study the importance of high-acuity vision for spotting a predator, but also of improving the behavioural response (escaping). The results are strong and the models used are well-adapted. This paper is a major contribution to our understanding of the use of visual adaptation in a foraging context when at risk. This is also a major contribution to the understanding of individual interaction in a flock.

      Weaknesses:

      I have identified only two weaknesses:

      (1) The authors often mixed the methods and the results, Which reduces the readability and fluidity of the manuscript. I would recommend the authors to re-structure the manuscript.<br /> (2) In some parts, the authors stated that they reconstructed the visual field of the pigeon, which is not true. They identified the foveal positions, but not the visual fields, which involve different sectors (binocular, monocular or blind). Similarly, they sometimes mix-up the area centralis and the fovea, which are two different visual adaptations.

      Thank you for your positive feedback. We addressed these comments by restructuring the methods and result sections as suggested, and by checking the terminology and specific vocabulary used throughout the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      First, I would like to say that I really enjoyed the manuscript. This is a great contribution to the field.

      Thank you for the positive feedback, we highly appreciate it.

      Then, I have some comments that I hope, would help the authors to improve the manuscript.

      Major comments :

      I would recommend the authors to restructure the methods and the results section. In many parts, the models used are presented in the results section, while this should be presented in the methods section.

      Thank you for the suggestion, we now have ensured that the model descriptions are presented in the statistic section of the methods.

      To me, the introduction is too long (more than 5 pages). It would be beneficial to reduce it considerably. Furthermore, in the introduction, it misses some information about the visual abilities of your species ((visual acuity, visual field, temporal resolution, contrast sensitivity....).

      We agree that the introduction was very long and reduced it by removing the “Methodological issues” as well as strongly reducing the “Experimental rationales” to a minimum. We also added the missing information on the visual abilities of the pigeons in the “Experimental rationales” section (see L135-150). Please note, however, that we refer to the temporal resolution of pigeon vision in the method section, to associate it with the information of the used monitor’s resolution.

      Minor comments :

      Lines 37-39: This needs a reference.

      A reference has been added (McFarland, 1977)

      Lines 39-41: But see some papers published recently on Harris's hawks.

      Thank you for the references, we added the citation as well as a few more papers (Kane et al., 2015; Kano et al., 2018; Miñano et al., 2023; Yorzinski & Platt, 2014).

      Lines 41-43: This sentence needs a reference as well.

      A reference has been added (Cresswell, 1994; M. H. R. Evans et al., 2018; Inglis & Lazarus, 1981)

      Lines 56-103: In this paragraph, head down and head up also depends from the retinal map of the birds! Some birds have visual streak that allow them to see a potential threats while foraging. Please add more information about the importance of photoreceptors distribution.

      Thank you for pointing out this issue. We rewrote the sentence L65-69 as follows to include the importance retinal structures.

      “In several species, especially those with a broad visual field and specific retinal structures such as the visual streaks, individuals can simultaneously engage in foraging activities while remaining vigilant (Fernández-Juricic, 2012), likely using peripheral vision to detect approaching threats (Bednekoff & Lima, 2005; Cresswell et al., 2003; Kaby & Lind, 2003; Lima & Bednekoff, 1999).”

      Lines 76-79: you wrote : ".... favor alternative hypotheses based on their findings". Which findings? You need to explain.

      We rewrote this part as follows (L80-81).

      “other studies found evidence for the risk dilution (Beauchamp & Ruxton, 2008) and the edge effect (Inglis & Lazarus, 1981) in their study systems.”

      Lines 109-110: It would be good to have a representation of what is an area and a fovea, and how it is placed in the eye, what type of fovea exists and how it is related to visual field. Where does it project?

      We now give a better description of the pigeon’s visual field in the experimental rationales section that we hope will help the reader understanding the key features of pigeon’s vision (see L135-150). Specifically, we now say in L137-138:

      “they have one fovea centrally located in the retina of each eye, with an acuity of 12.6 c/deg (Hodos et al., 1985). Their fovea projects laterally at ~75° into the horizon in their visual field.”

      Lines 109-113: You might need to see some new papers here about the fovea. See for instance Bringmann 2019.

      Thank you for the suggestion, we now give a more precise definition of the fovea and refer to Bringmann’s paper for more details (L113-114):

      “a pit-like area in the retina with high concentration of cone cells where visual acuity is highest, and is responsible for sharp, detailed, and color vision.”

      Lines 113-120: Please explain how the visual field is related to fovea? Where is the fovea project in the visual fields?

      Similarly to the question above, we now give a more precise description of the pigeon’s visual field (see L135-150).

      Line 131-134: For a non-expert, you would need to explain what is micro, meso and macro scale?

      These sentences have been removed when shortening the introduction and we are not referring to micro, meso and macro scales anymore.

      Lines 134-136: Please explain in one sentence the technique here.

      We now explain in one sentence how motion capture enables the tracking of head and body orientation (L130-132):

      “Motion capture cameras track with high accuracy the 3D position of markers, which, when attached to the pigeon’s head and body, enables to reconstruct the rotations of the head and body in all directions.”

      Line 140: You presented here for the first time the word "foveation". Has this term been used before? If so, please add a reference. If not, please explain what you mean by foveation precisely.

      Thank you for noticing this lack. We are now providing the following definition “directing visual focus to the fovea to achieve the clearest vision” in the first place where we mention the term foveation (L149-150).

      Lines 146-148: Please explain why this proves that it is appropriate to not record eyes movements, and is this true for every behaviours?

      We acknowledge that some small eye movement might occur and reduce the accuracy of the method. This error is considered in the system using the +-10 degrees range around the foveas. The lines the reviewer referred to were removed when shortening the introduction, but we added an explanation in the paragraph describing pigeon vision to make it clearer (L147-150):

      “Yet, it should be noted that their eye movement was not tracked in our system, although it is typically confined within a 5 degrees range (Wohlschläger et al., 1993). We thus considered this estimation error of the foveation (directing visual focus to the fovea to achieve the clearest vision) in our analysis, as a part of the error margin (see Methods).”

      Lines 161-163: What is the frontal and binocular field for? You would need to explain the different fields of view and what they are supposed to be for.

      Furthermore, does the visual field of pigeon have been studied? If so, you would need to add more information about it.

      This information is now given in the new paragraph describing the pigeon’s vision in the  “Experimental rationales” section (see L135-150).

      Figure 1: It is not clear here which panels correspond to a, b or c. Please use some boxes to clarify it.

      Thank you for the comment, we now have made the figure’s sub-panels clearer.

      Lines 193-194: You wrote "... such as foveas (also known as the area centralis). No, this is not the same.

      (1) In some species, you have two foveas, one placed centrally in the retina, one place temporally. So the fovea is not the area centralis.

      (2) Second, some species do have an area centralis but without a fovea.

      Thank you for pointing out the inaccuracy. In this case, we were referring specifically to the pigeon’s fovea which is sometimes referred to as “area centralis”, but we now changed the sentence as follow to avoid any confusion (L174-175):

      “The initial two hypotheses (Hypotheses 1 and 2) aim to examine whether foveation correlates with predator detection.”

      Lines 192-212: I did not understand the logic of the hypotheses numbers? Why do you have 2.1 but not 3.1 for instance? And if you have two hypotheses for the within a global one (for instance, 2.1 and 2.2), what is the main hypothesis 2? You should explain more here because we get lost here and in the result section as well.

      We recognize this section might have appeared confusing to the reader. In short, we had four main hypotheses: 1) the fovea is used to evaluate predator cues, 2) the latency to foveate is related to vigilance behaviors. These first 2 hypotheses aimed to determine if the latency to foveate on the predator cue could be related to the detection. 3) foveation is related to the escape response of the pigeons and 4) there is a collective influence in the escape response. We further divided some of the hypotheses into 2 sub-hypotheses whenever 2 different tests were used to answer the same question. We have modified this section to be clearer.

      Lines 224-229: Where are the figures and statistics for these results?

      These results are presented in Table S1. We apologize for forgetting to add this reference and have now added it (L211).

      Lines 229-231: This should be in the method section.

      This model explanation (as well as all other hereafter mentioned) have been moved to the method section as suggested.

      Lines 248-252: This should be in the method section. Furthermore, you should better explain the model selection.

      Please see earlier comment. Additionally, we are now better explaining how the model has been built.

      Figure 2: It is not clear on the figure which letters correspond to which panels. Please improve the readability of the figure.

      It was modified accordingly.

      Lines 274-278: This should be in the method section.

      Please see earlier comment.

      Line 281: The "Fig.3" should be mentioned in the previous sentence.

      It was modified accordingly.

      Figure 3: Please explain why the latency to foveate had negative values in Fig.2 but not here, and not in Fig. 4 as well. This again highlights that we missed a number of information in the methods about the transformation of the data and the model selection.

      The variable presented in Fig 2d is not the latency to foveate but the “Normalized frequency at which the object was observed within foveal regions” (hypothesis 1). It represents the amount of time the object was lying within one of the foveal regions of the individual (“how long the pigeons foveated on it”), further normalized to unit sum to make all objects comparable. This variable was indeed logit-transformed (hence the negative value) to improve residual fit in the model, but this information (as well as other transformations) are always clearly stated on the axis caption of the graphs. Additionally, we now have improved the statistical analysis section to make the model used for each hypothesis testing clearer. But please let us know if you have suggestions for a further improvement in terms of presentation.

      Lines 297-301: This should be in the method section.

      Please see earlier comment.

      Lines 301-305: Fig. 3 b and c only referred to the two first factors. Please add more figures for the other factors. This could be in supp. Mat.

      We added the 3 graphs for the proportion of time foveating on the monitor, the saccade rate and the proportion of time foveating on conspecifics in the supplementary (Fig S6).

      Lines 306-309: This should be in methods, and you should have explained in methods how you performed your model selection.....

      We prefer leaving this paragraph in the result section, as it was intended to give the reader extra information on the predictive power of the different variables (by comparing the effectiveness of the models including one variable at a time, all the rest being equal) and not on the model selection per se. However, we now explain our goal better in the statistics section regarding this analysis (L635-636):

      “We further tested the relative predictive power of the different test variables by comparing the resulting models’ efficiency using AIC scores.”

      Lines 317-319: This should be in the method section.

      Please see earlier comment.

      Lines 320-322: This should be in the method section.

      Please see earlier comment.

      Lines 332-334: This should be in the method section.

      Please see earlier comment.

      Lines 334-336: Then, if this is not significant, you cannot say that.

      Thank you for noticing the inaccuracy, we have now rephrased it as (L298-299):

      “Earlier foveation of the first pigeon was not significantly related to an earlier escape responses among the other flock members, although there was a trend (χ2(1) = 3.66, p = 0.0559).”

      Line 336: Please explain why you did different models. We missed a lot of information in the method about your strategy for statistics.?

      We have now added a lot more information on the models in the statistics section, according to this comment as well as the previous ones. We hope the explanations of the analyses are now clearer to the reader.

      Lines 339-349: This should be in the method section.

      Please see earlier comment.

      Results section: As you may have understood, there are too many sentence that should be moved into the method section. Futhermore, I would recommend to modify the headdings so that they are more biologically speaking. Similarly to what you have done in the discussion section.

      Thank you for the comments. We agree with most of them, and have modified the manuscript accordingly. Additionally, we now use the same headings in the results section as the ones used in the discussion to make the text easier to follow.

      Lines 500-501: What were the body weight of the pigeon? At which weight of their full weight they were?

      This information is now added (492 ± 41g; mean ± SD). We did not control the amount of food during our experiments and only ensured 24h without food by feeding the pigeons after the experiment was completed. This information was added as follows (L454-456):

      “On experimental days, they were fed only after the experiments was completed; this ensures 24-hour no feeding at the time of the experiment, although we did not control the amount of the food over the course of the experimental periods.”

      Line 522-523: Those screens are very good for pigeons.

      Thank you for the positive comment, we indeed tried to match bird vision as close as possible.

      Lines 527-528: At which frequency was produced the moving stimulus? Your screen can display up to 144Hz, which is very good. But can your laptop do it? If not, it is important to mention it as pigeons may have a temporal resolution of vision up to 149Hz.

      Our laptop indeed supports 144Hz display. In addition, we now mention the temporal resolution of pigeon vision (L480-482).

      “We specifically chose a monitor with high temporal resolution to match the pigeon’s Critical Flicker Fusion Frequency (threshold at which a flickering light is perceived by the eye as steady) that reaches up to 143Hz (Dodt & Wirth, 1954).”

      Lines 555-572: Did you use a control shape in your experiment? Indeed, they may escape because of a moving pattern but not a predator shape?

      We did not use a control shape, as the aim of the experiment was not to directly test the effect of the shape itself. We designed the predator cue to resemble an approaching predator to ensure a response from the pigeons, but it might be that other shapes would have worked as well.

      Lines 588-589: Please explain why the coordinate system of the pigeon's head is considered as the visual field?

      From what I have understood, you did not reconstruct the visual fields, but only the position of the fovea. This should be noted like this as visual field involves more than a sphere around the head (binocular and monocular sectors, blind sectors, vertical extension....).

      Thank you for noticing the inaccuracy, we indeed did not consider other sectors of the visual field and therefore rephrased it as (L551): “the location of the objects and conspecifics from the pigeon’s perspective”.

      Lines 601-604: How much does it represent?

      As this was estimated by visual inspection, we do not have the exact percentage of data loss that was caused by grooming. However, because of the number of cameras in the SMART BARN motion capture system, it is reliable in detecting markers inside the space in “ideal” conditions (without occlusion). For example, a similar set-up found marker track loss of only <1% using a model bird (Itahara & Kano 2022)

      Itahara, A., & Kano, F. (2022). “Corvid Tracking Studio”: A custom-built motion capture system to track head movements of corvids. Japanese Journal of Animal Psychology, 72(1), 1–16. https://doi.org/10.2502/janip.72.1.1

      Lines 610-612: You would need to cite Wood 1917 and Hodos et al. 1991 who described the presence of a fovea in this species.

      We added both citations to the manuscript.

      Line 611: Again, the fovea is not egal to area centralis.

      Thank you, we changed it as well.

      Lines 625-626: you wrote "... in a few instances....". Please explain more. How many? What proportion?

      This happened in 9 observations out of 120. We now specify it in the text as well (L587-589):

      “in a few instances (9 out of 120 observations), pigeons foveated on the model predator after the looming stimulus had disappeared, but these cases were excluded from our analysis.”

      Lines 640-653: We missed a lot of information in the section "statistical analysis". If you moved most of the sentence from the results that describe the methods in the method section, that would be much better. Furthermore, you would need to explain more what statistics you used, which model selection, what type of data transformation....

      We agree this section lacked information, and we moved the information from the result to the statistics section.

      Supplmentary materials: boxplots from Fig. S1 and S2 are too small and impossible to read. Please improve the readability.

      We now have enlarged these plots to make them more readable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Engineering of PAClight1P78A: A High-Performance Class-B1 GPCR-Based Sensor for PACAP1-38" by Cola et al. presents the development of a novel genetically encoded sensor, PAClight1P78A, based on the human PAC1 receptor. The authors provide a thorough in vitro and in vivo characterization of this sensor, demonstrating its potential utility across various applications in life sciences, including drug development and basic research.

      The diverse methods to validate PAClight1P78A demonstrate a comprehensive approach to sensor engineering by combining biochemical characterization with in vivo studies in rodent brains and zebrafish. This establishes the sensor's biophysical properties (e.g., sensitivity, specificity, kinetics, and spectral properties) and demonstrates its functionality in physiologically relevant settings. Importantly, the inclusion of control sensors and the testing of potential intracellular downstream effects such as G-protein activation underscore a careful consideration of specificity and biological impact.

      Strengths:

      The fundamental development of PAClight1P78A addresses a significant gap in sensors for Class-B1 GPCRs. The iterative design process -starting from PAClight0.1 to the final PAClight1P78A variant - demonstrates compelling optimization. The innovative engineering results in a sensor with a high apparent dynamic range and excellent ligand selectivity, representing a significant advancement in the field. The rigorous in vitro characterization, including dynamic range, ligand specificity, and activation kinetics, provides a critical understanding of the sensor's utility. Including in vivo experiments in mice and zebrafish larvae demonstrates the sensor's applicability in complex biological systems.

      Weaknesses:

      The manuscript shows that the sensor fundamentally works in vivo, albeit in a limited capacity. The titration curves show sensitivity in the nmol range at which endogenous detection might be possible. However, perhaps the sensor is not sensitive enough or there are not any known robust paradigms for PACAP release. A more detailed discussion of the sensors's limitations, particularly regarding in vivo applications and the potential for detecting endogenous PACAP release, would be helpful.

      We thank the reviewer for carefully analyzing our in vivo data and highlighting the limitation of our results regarding the sensor’s applicability in detecting endogenous PACAP. We added several sections conversing future possibilities for optimization in the discussion (see paragraphs 2-4). We agree that a more specific discussion of the limitations of our study is an important addition to help design future experiments. 

      There are several experiments with an n=1 and other low single-digit numbers. I assume that refers to biological replicates such as mice or culture wells, but it is not well defined. n=1 in experimental contexts, particularly in Figure 1, raises significant concerns about the exact dynamic range of the sensor, data reproducibility, and the robustness of conclusions drawn from these experiments. Also, ROI for cell cultures, like in Figure 1, is not well defined. The methods mentioned ROIs were manually selected, which appears very selective, and the values in Figure 1c become unnecessarily questionable. The lack of definition for "ROI" is confusing. Do ROIs refer to cells, specific locations on the cell membrane, or groups of cells? It would be best if the authors could use unbiased methods for image analysis that include the majority of responsive areas or an explanation of why certain ROIs are included or excluded.

      We thank the reviewer for the helpful suggestions. We have increased the number of replicates to n=3 for both HEK293T and neuron data depicted in Fig.1c. Furthermore, we have added Fig.1c’ containing the quantification of the maximum responses obtained in the dataset shown in Fig.1c also depicting the single values for each replicate. To clarify the definition of an ROI in our manuscript, we have detailed the process of ROI selection in the Methods section “Cell culture, imaging and quantification section”. Additionally, we also increased mouse numbers for in vivo PACAP infusions in mice (see Figure 4g).

      Reviewer #2 (Public Review):

      Summary:

      The PAClight1 sensor was developed using an approach successful for the development of other fluorescence-based GPCR sensors, which is the complete replacement of the third intracellular loop of the receptor with a circularly-permuted green fluorescent protein. When expressed in HEK cells, this sensor showed good expression and a weak but measurable response to the extracellular presence of PACAP1-38 (a

      F/Fo of 43%). Additional mutation near the site of insertion of the linearized GPF, at the C-terminus of the receptor, and within the second intracellular loop produced a final optimized sensor with F/Fo of >1000%. Finally, screening of mutational libraries that also included alterations in the extracellular ligand-binding domain of the receptor yielded a molecule, PAClight1P78A, that exhibited a high ligand-dependent fluorescence response combined with a high differential sensitivity to PACAP (EC50 30 nM based on cytometric sorting of stably transfected HEK293 cells) compared to its congener VIP, (with which PACAP shares two highly related receptors, VPAC1 and VPAC2) as well as several unrelated neuropeptides, and significantly slowed activation kinetics by PACAP in the presence of a 10-fold molar excess of the PAC1 antagonist PACAP6-38. A structurally highly similar control construct, PAClight1P78Actl, showed correspondingly similar basal expression in HEK293 cells, but no PACAP-dependent enhancement in fluorescent properties.

      PAClight1P78A was expressed in neurons of the mouse cortex via AAV9.hSyn-mediated gene transduction. Slices taken from PAClight1P78A-transfected cortex, but not slices taken from PAClight1P78Actl-transfected cortex exhibited prompt and persistent elevation of F/Fo after 2 minutes of perfusion with PACAP1-38 which persisted for up to 14 minutes and was statistically significant after perfusion with 3000, but not 300 or 30 nM, of peptide. Likewise, microinfusion of 200 nL of 300 uM PACAP1-38 into the cortex of optical fiber-implanted freely moving mice elicited a F/Fo (%) of greater than 15, and significantly higher than that elicited by application of similar concentrations of VIP, CRF, or enkephalin, or vehicle alone. In vivo experiments were carried out in zebrafish larvae by the introduction of PAClight1P78A into single-cell stage Danio rerio embryos using a Tol2 transposase-based plasmid with a UAS promoter via injection (of plasmid and transposase mRNA), and sorting of post-fertilization embryos using a marker for transgenesis carried in the UAS :

      PAClight1P78A construct. Expression of PAClight1P78A was directed to cells in the olfactory bulb which express the fish paralog of the human PAC1 receptor by using the Tg(GnRH3:gal4ff) line, and fluorescent signals were elicited by intracerebroventricular administration of PACAP1-38 at a single concentration (1 mM), which were specific to PACAP and to the presence of PAClight1P78A per se, as controlled by parallel experiments in which PAClight1P78Actl instead of PAClight1P78A was contained in the transgenic plasmid.

      Major strengths and weaknesses of the methods and results

      The report represents a rigorous demonstration of the elicitation of fluorescent signals upon pharmacological exposure to PACAP in nervous system tissue expressing PAClight1P78A in both mammals (mice) and fish (zebrafish larvae). Figure 4d shows a change in GFP fluorescence activation by PACAP occurring several seconds after the cessation of PACAP perfusion over a two-minute period, and its persistence for several minutes following. One wonders if one is apprehending the graphical presentation of the data incorrectly, or if the activation of fluorescence efficiency by ligand presentation is irreversible in this context, in which case the utility of the probe as a real-time indicator, in vivo, of released peptide might be diminished.

      We thank the reviewer for their careful consideration of our manuscript and agree that the activation of PAClight persisting for several minutes at micromolar concentrations could be a potential limitation for in vivo applications. We added a possible explanation for the persisting sensor activation in response to artificial application of PACAP38 in paragraph 3 of the discussion. We agree that this addition eases the interpretation of PAClight signals detected in vivo. 

      Appraisal of achievement of aims, and data support of conclusions:

      Small cavils with controls are omitted for clarity; the larger issue of appraisal of results based on the scope of the designed experiments is discussed in the section below. An interesting question related to the time dependence of the PACAP-elicited activation of PAClight1P87A is its onset and reversibility, and additional data related to this would be welcome.

      We agree that the reversibility of the sensor’s fluorescence is indeed an important feature especially for detecting endogenous PACAP release. Our data indicate that the sensor’s fluorescence is reversible when detecting small to medium doses of PACAP38 (see Figure 4d – Application of 30-300nM) that are presumably closer to physiological concentrations than the non-reversible concentration of 3000nM. Please, see also our new discussion on peptide concentrations in paragraph 4 of our discussion. For future experiments, it is indeed advisable to adjust the interval of repeated applications to the decay of the response at the respective concentration. Considering, the long-lasting downstream effects of endogenous signaling, longer intervals between ligand applications are generally preferred to match more closely the physiological range in which endogenous PAC1 is most likely affective. 

      Discussion of the impact of the work, and utility of the methods and data:

      Increasingly, neurotransmitter function may be observed in vivo, rather than by inferring in vivo function from in vitro, in cellular, or ex vivo experimentation. This very valuable report discloses the invention of a genetically encoded sensor for the class B1 GPCR PAC1. PAC1 is the major receptor for the neuropeptide PACAP, which in turn is a major neurotransmitter involved in brain response to psychogenic stress, or threat, in vertebrates as diverse as mammals and fishes. If this sensor possesses the sensitivity to detect endogenously released PACAP in vivo it will indeed be an impactful tool for understanding PACAP neurotransmission (and indeed PACAP action in general, in immune and endocrine compartments as well) in future experiments.

      However, the sensor has not yet been used to detect endogenously released PACAP. Until this has been done, one cannot answer the question as to whether the levels of exogenously perfused/administered PACAP used here merely to calibrate the sensor's sensitivity are indeed unphysiologically high. If endogenous PACAP levels don't get that high, then the sensor will not be useful for its intended purpose. The authors should address this issue and allude to what kind of experiments would need to be done in order to detect endogenous PACAP release in living tissue in intact animals. The authors could comment upon the success of other GPCR sensors that have been used to observe endogenous ligand release, and where along the pathway to becoming a truly useful reagent this particular sensor is.

      We thank the reviewer for highlighting the lack in clarity that the scope of this paper was not intended to cover the detection of endogenous PACAP release. We therefore expanded our discussion to encompass the intended purpose of detecting artificially infused or applied PAC1 agonists, such as conducting fundamental tests of drug specificity and developing new pharmacological ligands to selectively target PAC1. This includes a more detailed discussion of our in vivo findings and a clearer phrasing that stresses the potential application for applied drugs and not endogenous PACAP (see last paragraph in the discussion).

      We also agree that little is known about endogenous concentrations of PACAP in the brain. However, we have supplemented our discussion with several references estimating lower concentrations of PACAP and other peptides in vivo, suggesting average PACAP levels below the detection threshold of the sensor. Importantly, within certain brain regions and in closer proximity to release sites, significantly higher concentrations might be reached. Additionally, our data indicate that the concentrations observed under our current conditions do not saturate the sensor in vivo.  

      We therefore acknowledge the reviewer’s comment on the sensor’s potential limitations under our current experimental conditions. Hence, we expanded our discussion and suggest the use of higher resolution imaging to potentially reveal loci of high PACAP concentrations, which should be validated by future studies (see also our added discussion in paragraph 4). 

      Reviewer #3 (Public Review):

      Summary:

      The manuscript introduces PAClight1P78A, a novel genetically encoded sensor designed to facilitate the study of class-B1 G protein-coupled receptors (GPCRs), focusing on the human PAC1 receptor. Addressing the significant challenge of investigating these clinically relevant drug targets, the sensor demonstrates a high dynamic range, excellent ligand selectivity, and rapid activation kinetics. It is validated across a variety of experimental contexts including in vitro, ex vivo, and in vivo models in mice and zebrafish, showcasing its utility for high-throughput screening, basic research, and drug development efforts related to GPCR dynamics and pharmacology.

      Strengths:

      The innovative design of PAClight1P78A successfully bridges a crucial gap in GPCR research by enabling realtime monitoring of receptor activation with high specificity and sensitivity. The extensive validation across multiple models emphasizes the sensor's reliability and versatility, promising significant contributions to both the scientific understanding of GPCR mechanisms and the development of novel therapeutics. Furthermore, by providing the research community with detailed methodologies and access to the necessary viral vectors and plasmids, the authors ensure the sensor's broad applicability and ease of adoption for a wide range of studies focused on GPCR biology and drug targeting.

      Weaknesses

      To further strengthen the manuscript and validate the efficacy of PAClight1P78A as a selective PACAP sensor, it is crucial to demonstrate the sensor's ability to detect endogenous PACAP release in vivo under physiological conditions. While the current data from artificial PACAP application in mouse brain slices and microinfusion in behaving mice provide foundational insights into the sensor's functionality, these approaches predominantly simulate conditions with potentially higher concentrations of PACAP than naturally occurring levels.

      We thank the reviewer for their valuable comments and agree that the use of PAClight for detecting endogenous PACAP will be of big interest for the scientific community and should be a goal for future research. Considering the time, equipment and additional animal licenses necessary, we are convinced that these questions would go beyond the scope of the current paper and might rather be addressed in a follow-up publication. We therefore rephrased the discussion and added more details to clarify further the intended purpose of the current study. Additionally, we added a paragraph in the discussion suggesting experiments needed to validate PAClight for putative future in vivo applications. 

      Although the sensor's specificity for the PAC1 receptor and its primary ligand is a pivotal achievement, exploring its potential application to other GPCRs within the class-B1 family or broader categories could enhance the manuscript's impact, suggesting ways to adapt this technology for a wider array of receptor studies. Additionally, while the sensor's performance is convincingly demonstrated in short-term experiments, insights into its long-term stability and reusability in more prolonged or repeated measures scenarios would be valuable for researchers interested in chronic studies or longitudinal behavioral analyses. Addressing these aspects could broaden the understanding of the sensor's practical utility over extended research timelines.

      We extend our gratitude to the reviewer for diligently assessing our results. 

      Indeed, the very high level of sensitivity that we could achieve in PAClight leads us to think that potentially a grafting-based approach, such as the one we’ve recently described for class-A GPCR-based sensors (PMID: 37474807) could also work for the direct generation of multiple class-B1 sensors based on the optimized fluorescent protein module present in PAClight. Unfortunately, considering the amount of work that testing this hypothesis would entail, we are not able to perform these experiments in the context of this revision, and would rather pursue them as a future project. Nevertheless, we have expanded the discussion of the manuscript with a paragraph with these considerations.

      While we lack comprehensive data on the long-term stability of the sensor, our preliminary findings from photometry recordings optimization indicate consistent baseline expression of PAClight and PACLight ctrl over several weeks. Conducting experiments to systematically assess stability would require several months, which is currently impractical due to limitations in tools and licenses for repeated in vivo infusions. Hence, we intend to include these experiments in potential follow-up studies.

      Furthermore, the current in vivo experiments involving microinfusion of PACAP near sensor-expressing areas in behaving mice are based on a relatively small sample size (n=2), which might limit the generalizability of the findings. Increasing the number of subjects in these experimental groups would enhance the statistical power of the results and provide a more robust assessment of the sensor's in vivo functionality. Expanding the sample size will not only validate the findings but also address potential variability within the population, thereby reinforcing the conclusions drawn from these crucial experiments.

      We agree with the reviewer that a sample size of N=2 is not sufficient for in vivo recordings. We therefore increased the sample size and now present recordings with 5 PAClight1P78A and 4 PACLight-control mice. Of note, the new data validate our previous findings and conclusions and give a better idea of the variability in vivo that we now discuss in much more detail in the discussion (see paragraph 2). 

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      The lower potency of maxadilan activation might reflect broader implications for ligand-receptor dynamics. Perhaps the authors could discuss the maxadilan binding from a structural perspective, including AlphaFold models. Also, discussing how these findings might influence sensor application in diverse biological contexts would be insightful. Clear definitions and consistent use of these terms are crucial for ensuring that readers understand the methods and results.

      We would like to thank the reviewer for the comments. As part of this work, we did not obtain a dose-response curve for maxadilan peptide, and only reported the maximal response of the sensor to a high concentration of the peptide (10 µM). Thus, our findings would rather inform us on the maximal efficacy of the peptide, as opposed to its potency towards the PAC1R. Furthermore, we would like to point out that due to the lack of structural details for any GPCR-based sensor published to date, we cannot make any molecularly accurate conclusion regarding the precise reasons why a different ligand (in this case the sandfly maxadilan) induces a lower maximal efficacy of the response compared to the endogenous cognate ligand of the receptor. We do not believe that AlphaFold models can accurately replace structural information in this regard, especially given the consideration that the aminoacid linker regions between the GPCR and the fluorescent protein, which are a critical determinant of allosteric chromophore modulation by ligand-induced conformational changes, typically obtain the lowest confidence score in all AlphaFold predicted structural models of GPCR-based sensors. Finally, we would like to refer the reviewer to a very nice recent publication (PMID: 32047270) which resolved the structures of each of these peptides bound to the PAC1 receptor-Gs protein complex, which provides accurate molecular details on the different modalities of receptor binding and activation by PACAP138  versus maxadilan.

      Reviewer #2 (Recommendations For The Authors):

      The authors are congratulated on the meticulous achievement of their aim, i.e. a fluorescence-based sensor for the detection of PACAP with in vivo utility. Whether or not this sensor will have the requisite sensitivity to detect the release of endogenous PACAP within various regions of the nervous system, in response to specific environmental stimuli or changes in brain or physiological state, remains to be determined.

      We thank the reviewer for the very positive evaluation of our manuscript and for the suggested additions that will improve the strength of our arguments.

      We agree that the in vivo detection of endogenous PACAP will be an important objective for future studies. Due to time, resource and animal license constraints, we are not able to address this objective in our current study, but we now detail possible future experiments in the discussion section. Please see also our answer to the suggested discussion points previously.

      Reviewer #3 (Recommendations For The Authors):

      To comprehensively assess the sensor's sensitivity and specificity to endogenous PACAP, I recommend conducting additional in vivo experiments where PAClight1P78A is expressed in neurons that endogenously express the Pac1r receptor (using Adcyap1r1-Cre mouse line). These experiments should involve applying sensory or emotional stimuli known to evoke PACAP release or activating upstream PACAP-expressing neurons. Such studies would offer valuable data on the sensor's performance under natural physiological conditions and its potential utility for exploring PACAP's roles in vivo.

      We express our gratitude to the reviewer for providing detailed methodological approaches to examine endogenous PACAP release. These suggestions will prove invaluable for future investigations and are important additions to a follow-up publication. As mentioned earlier, we have incorporated some of these approaches into our discussion. Additionally, we have underscored the existing limitations in detecting endogenous PACAP in vivo and emphasized the relevance of PAClight for drug development purposes.

    1. Author response:

      eLife assessment

      This useful study describes an antibody-free method to map G-quadruplexes (G4s) in vertebrate cells. While the method might have potential, the current analysis is primarily descriptive and does not add substantial new insights beyond existing data (e.g., PMID:34792172). While the datasets provided might constitute a good starting point for future functional studies, additional data and analyses would be needed to fully support the major conclusions and, at the same time, clarify the advantage of this method over other methods. Specifically, the strength of the evidence for DHX9 interfering with the ability of mESCs to differentiate by regulating directly the stability of either G4s or R-loops is still incomplete.

      We thank the editors for their helpful comments.

      Given that antibody-based methods have been reported to leave open the possibility of recognizing partially folded G4s and promoting their folding, we have employed the peroxidase activity of the G4-hemin complex to develop a new method for capturing endogenous G4s that significantly reduces the risk of capturing partially folded G4s. We will be happy to clarify the advantage of our method.

      In the Fig. 7, we applied the Dhx9 CUT&Tag assay to identify the G4s and R-loops directly bound by Dhx9 and further characterized the differential Dhx9-bound G4s and R-loops in the absence of Dhx9. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Furthermore, we showed that depletion of Dhx9 significantly altered the levels of G4s or R-loops around the TSS or gene bodies of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, and also their RNA levels (Fig.7 I). The above evidence is sufficient to support the transcriptional regulation of mESCs cell fate by directly modulating the G4s or R-loops within the key regulators of mESCs.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Non-B DNA structures such as G4s and R-loops have the potential to impact genome stability, gene transcription, and cell differentiation. This study investigates the distribution of G4s and R-loops in human and mouse cells using some interesting technical modifications of existing Tn5-based approaches. This work confirms that the helicase DHX9 could regulate the formation and/or stability of both structures in mouse embryonic stem cells (mESCs). It also provides evidence that the lack of DHX9 in mESCs interferes with their ability to differentiate.

      Strengths:

      HepG4-seq, the new antibody-free strategy to map G4s based on the ability of Hemin to act as a peroxidase when complexed to G4s, is interesting. This study also provides more evidence that the distribution pattern of G4s and R-loops might vary substantially from one cell type to another.

      We appreciate your valuable points.

      Weaknesses:

      This study is essentially descriptive and does not provide conclusive evidence that lack of DHX9 does interfere with the ability of mESCs to differentiate by regulating directly the stability of either G4 or R-loops. In the end, it does not substantially improve our understanding of DHX9's mode of action.

      In this study, we aimed to report new methods for capturing endogenous G4s and R-loops in living cells. Dhx9 has been reported to directly unwind R-loops and G4s or promote R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). To understand the direct Dhx9-bound G4s and R-loops, we performed the Dhx9 CUT&Tag assay and analyzed the co-localization of Dhx9-binding sites and G4s or R-loops. We found that 47,857 co-localized G4s and R-loops are directly bound by Dhx9 in the wild-type mESCs and 4,060 of them display significantly differential signals in absence of Dhx9, suggesting that redundant regulators exist as well. We showed that depletion of Dhx9 significantly altered the RNA levels of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, which coincides with the significantly differential levels of G4s or R-loops around the TSS or gene bodies of these genes (Fig.7). The comprehensive molecular mechanism of Dhx9 action is indeed not the focus of this study. We will work on it in the future studies. Thank you for the comments.

      There is no in-depth comparison of the newly generated data with existing datasets and no rigorous control was presented to test the specificity of the hemin-G4 interaction (a lot of the hemin-dependent signal seems to occur in the cytoplasm, which is unexpected).

      The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity. To identify the specific signals, we have included the non-label control and used this control to call confident HepG4 peaks in all HepG4-seq assays.

      The hemin-RNA G4 complex has also been reported to have mimic peroxidase activity and trigger similar self-biotinylation signals as DNA G4s (PMID: 32329781, 31257395, 27422869). Therefore, it is not surprising to observe hemin-dependent signals in the cytoplasm generated by cytoplasmic RNA G4s.

      In the revised version, we will include careful comparison between our data and previous datasets.

      The authors talk about co-occurrence between G4 and R-loops but their data does not actually demonstrate co-occurrence in time. If the same loci could form alternatively either R-loops or G4 and if DHX9 was somehow involved in determining the balance between G4s and R-loops, the authors would probably obtain the same distribution pattern. To manipulate R-loop levels in vivo and test how this affects HEPG4-seq signals would have been helpful.

      Single-molecule fluorescence studies have shown the existence of a positive feedback mechanism of G4 and R-loop formation during transcription (PMID: 32810236, 32636376), suggesting that G4s and Rloops could co-localize at the same molecule. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Although depletion of Dhx9 resulted in 6,171 Dhx9-bound co-localized G4s and R-loops with significantly altered levels of G4s or R-loops, only 276 of them (~4.5%) harbored altered G4s and R-loops, suggesting that the interacting G4s and R-loops are rare in living cells. Nowadays, the genome-wide co-occurrence of two factors are mainly obtained by bioinformatically intersection analysis. We agreed that the heterogenous distribution between cells will give false positive co-occurrence patterns. We will carefully discuss this point in the revised version. At the same time, we will make efforts to develop a new method to map the co-localized G4 and R-loop in the same molecule in the future study.

      This study relies exclusively on Tn5-based mapping strategies. This is a problem as global changes in DNA accessibility might strongly skew the results. It is unclear at this stage whether the lack of DHX9, BLM, or WRN has an impact on DNA accessibility, which might underlie the differences that were observed. Moreover, Tn5 cleaves DNA at a nearby accessible site, which might be at an unknown distance away from the site of interest. The spatial accuracy of Tn5-based methods is therefore debatable, which is a problem when trying to demonstrate spatial co-occurrence. Alternative mapping methods would have been helpful.

      In this study, we used the recombinant streptavidin monomer and anti-GP41 nanobody fusion protein (mSA-scFv) to specifically recognize hemin-G4-induced biotinylated G4 and then recruit the recombinant GP41-tagged Tn5 protein to these G4s sites. Similarly, the recombinant V5-tagged N-terminal hybrid-binding domain (HBD) of RNase H1 specifically recognizes R-loops and recruit the recombinant protein G-Tn5 (pG-Tn5) with the help of anti-V5 antibody. Therefore, the spatial distance of Tn5 to the target sites is well controlled and very short, and also the recruitment of Tn5 is specifically determined by the existence of G4s in HepG4-seq and R-loops in HBD-seq.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Liu et al. explore the interplay between G-quadruplexes (G4s) and R-loops. The authors developed novel techniques, HepG4-seq and HBD-seq, to capture and map these nucleic acid structures genome-wide in human HEK293 cells and mouse embryonic stem cells (mESCs). They identified dynamic, cell-type-specific distributions of co-localized G4s and R-loops, which predominantly localize at active promoters and enhancers of transcriptionally active genes. Furthermore, they assessed the role of helicase Dhx9 in regulating these structures and their impact on gene expression and cellular functions.

      The manuscript provides a detailed catalogue of the genome-wide distribution of G4s and R-loops. However, the conceptual advance and the physiological relevance of the findings are not obvious. Overall, the impact of the work on the field is limited to the utility of the presented methods and datasets.

      Strengths:

      (1) The development and optimization of HepG4-seq and HBD-seq offer novel methods to map native G4s and R-loops.

      (2) The study provides extensive data on the distribution of G4s and R-loops, highlighting their co-localization in human and mouse cells.

      (3) The study consolidates the role of Dhx9 in modulating these structures and explores its impact on mESC self-renewal and differentiation.

      We appreciate your valuable points.

      Weaknesses:

      (1) The specificity of the biotinylation process and potential off-target effects are not addressed. The authors should provide more data to validate the specificity of the G4-hemin.

      The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity.

      (2) Other methods exploring a catalytic dead RNAseH or the HBD to pull down R-loops have been described before. The superior quality of the presented methods in comparison to existing ones is not established. A clear comparison with other methods (BG4 CUT&Tag-seq, DRIP-seq, R-CHIP, etc) should be provided.

      Thank you for the suggestions. We will include the comparisons in the revised version.

      (3) Although the study demonstrates Dhx9's role in regulating co-localized G4s and R-loops, additional functional experiments (e.g., rescue experiments) are needed to confirm these findings.

      Dhx9 has been demonstrate as a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation in previous studies (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). We believe that the current new dataset and previous studies are enough to support the capability of Dhx9 in regulating co-localized G4s and R-loops.

      (4) The manuscript would benefit from a more detailed discussion of the broader implications of co-localized G4s and R-loops.

      Thank you for the suggestions. We will include a more detailed discussion in the revised version.

      (5) The manuscript lacks appropriate statistical analyses to support the major conclusions.

      We apologized for this point. Whereas we have applied careful statistical analyses in this study, lacking of some statistical details make people hard to understand some conclusions. We will carefully add details of all statistical analysis.

      (6) The discussion could be expanded to address potential limitations and alternative explanations for the results.

      Thank you for the suggestions. We will include a more detailed discussion about this point in the revised version.

      Reviewer #3 (Public Review):

      Summary:

      The authors developed and optimized the methods for detecting G4s and R-loops independent of BG4 and S9.6 antibody, and mapped genomic native G4s and R-loops by HepG4-seq and HBD-seq, revealing that co-localized G4s and R-loops participate in regulating transcription and affecting the self-renewal and differentiation capabilities of mESCs.

      Strengths:

      By utilizing the peroxidase activity of G4-hemin complex and combining proximity labeling technology, the authors developed HepG4-seq (high throughput sequencing of hemin-induced proximal labelled G4s), which can detect the dynamics of G4s in vivo. Meanwhile, the "GST-His6-2xHBD"-mediated CUT&Tag protocol (Wang et al., 2021) was optimized by replacing fusion protein and tag, the optimized HBD-seq avoids the generation of GST fusion protein aggregates and can reflect the genome-wide distribution of R-loops in vivo.

      The authors employed HepG4-seq and HBD-seq to establish comprehensive maps of native co-localized G4s and R-loops in human HEK293 cells and mouse embryonic stem cells (mESCs). The data indicate that co-localized G4s and R-loops are dynamically altered in a cell type-dependent manner and are largely localized at active promoters and enhancers of transcriptionally active genes.

      Combined with Dhx9 ChIP-seq and co-localized G4s and R-loops data in wild-type and dhx9KO mESCs, the authors confirm that the helicase Dhx9 is a direct and major regulator that regulates the formation and resolution of co-localized G4s and R-loops.

      Depletion of Dhx9 impaired the self-renewal and differentiation capacities of mESCs by altering the transcription of co-localized G4s and R-loops-associated genes.

      In conclusion, the authors provide an approach to studying the interplay between G4s and R-loops, shedding light on the important roles of co-localized G4s and R-loops in development and disease by regulating the transcription of related genes.

      We appreciate your valuable points.

      Weaknesses:

      As we know, there are at least two structure data of S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred to (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the authors' bias against S9.6 antibodies needs also to be changed. However, as the authors had questioned the specificity of the S9.6 antibody, they should compare it in parallel with the data they have and the data generated by the widely used S9.6 antibody.

      Thank you for the updating information about the structure data of S9.6 antibody. We politely disagree the specificity of the S9.6 antibody on RNA:DNA hybrids. The structural studies of S9.6 (PMID: 35347133, 35550870) used only one RNA:DNA hybrid to show the superior specificity of S9.6 on RNA:DNA hybrid than dsRNA and dsDNA. However, Fabian K. et al has reported that the binding affinities of S9.6 on RNA:DNA hybrid exhibits obvious sequence-dependent bias from null to nanomolar range (PMID: 28594954). We will include the comparison between S9.6-derived data and our HBD-seq data in the revised version.

      Although HepG4-seq is an effective G4s detection technique, and the authors have also verified its reliability to some extent, given the strong link between ROS homeostasis and G4s formation, and hemin's affinity for different types of G4s, whether HepG4-seq reflects the dynamics of G4s in vivo more accurately than existing detection techniques still needs to be more carefully corroborated.

      Thank you for pointing out this issue. In the in vitro hemin-G4 induced self-biotinylation assay, parallel G4s exhibit higher peroxidase activities than anti-parallel G4s. Thus, the dynamics of G4 conformation could affect the HepG4-seq signals (PMID: 32329781). In the future, people may need to combine HepG4-seq and BG4s-eq to carefully explain the endogenous G4s. We will carefully discuss this point in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Due to the significant difference between the infection timeline of mild (1 day post symptom onset) and severe (10 days post symptom onset) cohort at enrollment, an informative analysis to consider is to compare timepoint 2 from the mild cohort to timepoint 1 from the severe cohort.

      In agreement with what the reviewer noted on his comment, to be more helpful we completed the analysis comparing timepoint 2 from the mild cohort to timepoint 1 from severe cohort, which is now included as Figure 4-figure supplement 5. The new text added is on pages 13-14, lines 346-355 explaining this analysis. We also included a paragraph in the discussion on page 22, lines 595-604. We have resolved to show this comparison to enforce the main observation related to Natural Killer Cytotoxicity pathways enriched in all analyses of this work.

      (2) Alternatively, as this information is available, the authors may group the samples based on the individual's infection timeline as opposed to the recruitment timeline.

      Patients in both groups were enrolled at the peak of their symptoms. According to this criterion, we grouped the patients to generate more significant results. Since these infections occurred naturally, we have no accurate information regarding the infection timing of patients. However, if the samples were grouped in order of individual infection timeline, the analysis would be statistically weak to make conclusions about the course of COVID-19, as disease progression would not be coordinated. Our grouping approach provided us a good confidence range, despite the tiny population evaluated.

      (3) The authors selected three co-regulated network modules based on the size of module membership genes, selecting the three modules containing the largest gene membership. Small co-regulated networks can also offer important biological insights into specific molecular machinery associated with disease outcomes.

      Figure 5 was updated including two more networks (besides blue), for brown and turquoise modules (5E and 5F). This new information allowed us to understand deeply the three larger modules with the most significant results, due to the number of genes they included (blue: 704, brown: 508, and turquoise: 712). The new text describing this analysis is included in page 15 lines 388-396. The remaining 7 modules were also analyzed, and the Gene Ontology/Pathways enrichment were included in 2 new supplemental figures (Figure 5 - figure supplement 1 and 2). The new text describing this analysis is included on page 15, lines 397-401.

      (4) An alternative selection criterion that can inform biological associations between module genes and disease severity is the strength of the correlation coefficients. It seems from Figure 5B, that yellow, turquoise, and green modules have a moderate positive correlation with severe patients, while brown, blue, and gray modules show a slight positive correlation with mild outpatients. A recommendation for the authors is to consider revising Figure 5C to include the enrichment of these additional modules and include these modules in the interpretation of the results.

      The correlations between cohorts and the modules (blue, brown and turquoise) are clearly identified for severe or mild patients. However, for several smaller modules, correlations are heterogenous for different patients of the cohorts, making it hard to gain a clear conclusion related to severity groups. In this sense, the 7 modules were analyzed as is indicated in the previous response number #3, and the results offer an idea of the different transcriptional programs present at different patients in different stages of disease. However, the small number of genes in some modules brings weak results of GO and enriched pathways, making it difficult to interpretation. The text describing this figure is included in page 15 lines 397-401. Also, the network analyses for brown and turquoise modules were included in figure 5 as 5E-F and the text detailing these figures was included on page 15 lines 388-396.

      (5) In Figures 3E and 3F, the authors present enrichment analyses of differentially expressed genes from day 28. However, earlier in the results (lines 226-228), the authors reported no differentially expressed genes observed between the mild and severe participant cohort at this time point. Can the authors clarify which comparison was performed to obtain the list of differentially expressed genes used in the enrichment analyses in Figures 3E and 3F?

      The discrepancy in this case stems from separate criteria employed for comparison in each case. At the pairwise comparison, DEGs list is different from the longitudinal comparison mentioned afterwards, as for this later analysis we selected only the genes with different trajectories throughout the study (Figure 3). To clarify this point, we included a new paragraph on page 11, lines 278-285.

      Original:

      “We detected 828 genes that exhibited temporal and quantitative expression level differences during the progression of disease. We discovered additional biological processes and KEGG pathways that were differentially enriched during the COVID-19 progression in mild and severe patients (Figure 3) using the Enrichr platform (G. Chen et al., 2020)”

      Changed to:

      “To do so, we first identified genes that were differentially expressed between severity groups, and second, we chose only those that also showed changes in their trajectories across sampling times. In doing so, we found 828 genes that exhibited temporal differences in expression level during disease progression. Then using the Enrichr platform (G. Chen et al., 2020), we discovered additional biological processes and KEGG pathways that were differentially enriched during the COVID-19 progression in mild and severe patients (Figure 3).”

      (6) Additionally, the authors refer to specific enriched genes in Figure 3 (lines 298-302), but Figure 3 only displays the enriched terms. Can the authors include the results from the enrichment analysis that include gene membership for each enriched term in the supplement?

      Certainly, there is no figure or table in the initial version that includes the gene list for this analysis. We have now included a supplement table 1 and 2 that details each pathway, along with its gene list.

      (7) In line 104, can the authors clarify the parameters used to define well-matched samples?

      Based on the observations made by the reviewers, we decided to change the wording to make it more obvious about the message of this paper. The update was included on page 5, line  as follows:

      Original:

      “Here, we designed a longitudinal investigation using well-matched samples to study how changes in gene expression in distinct immune effector cells changed during the earliest time points after diagnosis and during progression of clinical disease”,

      Changed to:

      “Here, we designed a longitudinal comparison between mild and severe patients, choosing the appropriate samples according to the clinical progression and the unbiased gene expression profile”

      (8) In lines 113-116, can the authors clarify how their approach mitigates noise/potential biases and very briefly, describe what the nature of noise/biases could be?

      The main goal of this paragraph is to show that, while there are several pathways with statistical significance in our analyses, the focus was on NK cell cytotoxicity because this molecular pathway showed bridges between other relevant immune responses; thus, the pathways chosen to respond to its intricated transcriptional program instead of a biased interest. The text was edited and included on page 6, line 111-131 as follows:

      Original:

      “We used a pairwise comparison of gene expression, gene set enrichment, and weight-correlated gene network analyses to detect differential expression of genes involved with the cytotoxic signaling pathway of Natural Killer (NK) cells in mild verses severe progression of disease. We promoted a broad and integrated point of view throughout the transcriptomic analysis of functional pathways to mitigate noise and potential biases (Bastard et al., 2020; Delorey et al., 2021; Schultze & Aschenbrenner, 2021; S. Zhang et al., 2022). We found close connectivity between NK signaling pathway genes and those of cytokine-cytokine receptor signaling pathways, along with Th1/Th2 cell differentiation genes, as part of the transcriptional circuit executed preferentially among mildly ill patients. Our results detected transcriptional circuits engaging multiple regulatory checkpoints. These findings indicated that the innate NK signaling pathway (cell cytotoxic activity) is beneficial, perhaps a critically-necessary activity needed to effectively eradicate coronavirus. We interpreted that an adaptive immune response that included early cell-mediated immunity was important for reducing disease severity in mild patients. This balance between humoral- and cell-mediated immunity appeared to be less robust in patients presenting with severe COVID-19. These results detected components of the immune response that were significantly associated with the differences in symptom severity observed between mild and severely ill COVID-19 patients.”

      Changed to:

      “Briefly, to gain more insights into our findings and complement their functional context, we used a pairwise comparison of gene expression, gene set enrichment, and weight-correlated gene network analyses. By doing so, we identified pathways of genes involved with the NK cell cytotoxicity enriched in mild patients when compared to severe. Besides focusing on a particular molecular pathway, we investigated the interactions to better comprehend the underlying phenomena of a successful immune response, contributing to an integrated point of view throughout the transcriptomic analyses of functional pathways to mitigate potential biases attributed to focusing the study on a single pathway. In this regard, we revealed that the NK signaling pathway was intricately related to other transcriptional circuits, such as those governing Th1/Th2 cell differentiation and cytokine-cytokine receptor signaling pathways. These interactions highlight the importance of these pathways as bridges between the innate and adaptive immune responses throughout the disease, implying that the innate NK signaling pathway (cell cytotoxic activity) is beneficial, and possibly a critical activity required to effectively eradicate coronavirus. We also concluded that an adaptive immune response including early cell-mediated immunity was significant in lowering disease severity. The link between the primary innate NK cell activity and the transcriptional priming of adaptive Th1 and Th2 cell responses appears to be more robust in mild patients than in severe.”

      (9) In line 120, can the authors clarify which regulatory checkpoints were being referred to?

      The concept of “checkpoint” was changed to “bridges” (line 124), because offers a clearer idea about the molecular interaction displayed across the different enriched pathways described in our study. In this sense, the bridges show the connection between innate immune response by NK cell and the adaptive immune response by Th1/Th2 cells

      (10) In lines 125-126, can the authors refer to specific results to support this observation?

      Lines 111 to 129 summarize the results of the analysis that support the aforementioned phrase. However, the original sentence referred was modified for better comprehension on page 6, lines 129-131 as follows:

      Original:

      “This balance between humoral- and cell-mediated immunity appeared to be less robust in patients presenting with severe COVID-19”

      Changed to:

      “The link between the primary innate NK cell activity and the transcriptional priming of adaptive Th1 and Th2 cell responses appears to be more robust in mild patients than in severe.”

      (11) In lines 184-185, can the authors clarify what the term "mixed" specifically refers to?

      The original text was modified for better comprehension on page 8, lines 177-179 as follows:

      Original:

      “Interestingly, on day-28, when the majority of patients had recovered, samples from severely ill patients were still mixed compared to those with mild symptoms.”

      Changed to:

      “Interestingly, on day-28, when the majority of patients had recovered, samples from severely ill patients were pooled together with those mild patients who had already recovered”.

      (12) In line 286, can the authors clarify how quantitative expression level differences are distinct from temporal expression level differences?

      Despite the differences in the enrollment time between mild and severe cohorts, it was made precisely during COVID-19 symptoms peaks, as illustrated in figure 1B. Also supporting this criterion, the longitudinal analysis outlined in figure 3 was performed taking into account the changes in gene expression trajectories along all sampling times. This point has significance because the results obtained from it exposed several transcriptional programs that were dynamically executed along disease progression, even independently of the pairwise comparison approaches carried out previously.

      (13) In Figure 1C, there seemed to be two data points associated with "M1 0 days" and "M4 28 days" with distinct PC projections. Could these samples be mislabeled?

      The figure was revised and completed. The hexagon symbol for day-28 was changed for a star symbol. The “M1 0 days” and “M4 28 days” samples were labeled correctly.  See below figure 1C with changes as follows:  

      (14) In Figure 1D caption: could authors clarify if the ranking of 100 genes was based on the log2FC or adjusted p-values?

      The criteria considered was Fold Change ≥ 2 and the FDR ≤ 0.05 which is included in the methodology on page 23, lines 657-660

      (15) In Figure 4D, can the authors include the expression z score for the healthy participants?

      We could include this information, but we consider that it would not help for the understanding of this figure because in this way we put the focus on the differential trajectories between mild and severe patients. Also, DEGs from mild and severe cohorts from this analysis or any other in this work were obtained relatively to healthy donors.

      (16) Related to this, can the authors clarify if the expression z scores were computed using the mean and standard deviations of all samples within the study or relative to a specific participant cohort?

      The z-score was used considering the mild and severe patients to calculate mean and then the standard deviation of each group. A new paragraph was included in material and methods on page 24, lines 662-664.

      (17) In Figure 5B, can the authors include column annotations for participants and sampling time points?

      The figure 5B was updated and completed with the suggested information.

      (18) In Figure 1 - Figure Supplement 2, can the authors include the volcano plot from the pairwise comparison for day 28 showing no differentially expressed genes between mild and severe participants as reported in the results (lines 226-228)?

      The third volcano plot for day 28 was included in the updated figure 1 supplement 2.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is generally very well-constructed and well-written. However, the following are the major concerns mostly regarding the study design and participant selection.

      (1) The authors have used enrolment day as D0 which is not reflective of the immune response timeline. Especially when the designated 'D0' for the severe group is 10.0 + 1.8 days post symptom (DPS) onset while the 'D0' for the mild group is 1.2 + 1.3 DPS. In the context of an acute infection discussed herewith, this difference is critical.

      As tempting as it is to conduct longitudinal studies on COVID-19, the authors might do better focusing on specific acute time points (within 10 days post-symptom onset) and convalescent time points (beyond 28 days post-symptom). A better comparison would be D0 severe with D7 mild (aligning the DPS to be between 7-10 days in both groups).

      Despite the differences in the enrolment time between mild and severe cohorts, it was made precisely during COVID-19 symptoms peaks, as illustrated in figure 1B. Also supporting this criterion, the longitudinal analysis outlined in figure 3 was performed taking into account the changes in gene expression trajectories along all sampling times. This point has significance because the results obtained from it exposed several transcriptional programs that were dynamically executed along disease progression, even independently of the pairwise comparison approaches carried out previously. Likewise, we agree with the observation of the reviewer, because as we mentioned in the article, it is difficult to properly compare disease progression between naturally infected patients. So, to better support our findings, we complemented them throughout a pairwise comparison between day-7 samples from mild and day-0 samples from severely ill individuals, finding GO terms and enriched pathways related to NK cell function across the mild cohort, as seen in Figure 4-figure supplement 5. This result enforced the main findings gained from the different analyses carried out in this work, highlighting the relevance of the innate immune response of Natural Killer cells, which correlated with a mild progression of disease. The new paragraph describing this analysis was included in pages 13-14, lines 346-355. We also included a paragraph in the discussion on page 22, lines 595-604.

      (2) Though there are four participants within each group, one of the participants with severe infection (S1) only has the D0 time point which probably undermines the statistical significance of the results.

      This is an accurate observation, as the statistical weight will allow the deeper alterations to be evaluated while the more subtle ones will most likely be excluded from this study. In our analyses, we focused on variations with high statistical significance, which led to the discovery of a distinct Natural Killer response between mild and severe cohorts.

      (3) The authors should also account for any medications administered to the severe group in the ICU before enrolment in the study -immune-dampening drugs or steroids which may alter neutrophil recruitment or other immune functions.

      Only one severe patient received medication both prior to and during the COVID-19 disease. Even though several medications were administered to this patient, their effects have not been found to increase the neutrophil response.

      (4) What was the viral load status at the different time points analyzed - how does this relate to the immune and clinical findings?

      In this recruitment the viral load status was not measured.

      (5) Was any complete blood count or basic immune phenotyping conducted on these samples? Important to know the various cell frequencies in the PBMC mix sent for sequencing to account for contamination of lymphocytes with RBCs/monocytes/neutrophils as well as any lymphopenia.

      This measurement was not done for these samples. However, our protocol of PBMC purification has been tested before and showed small quantities of red blood cell contamination in the process. Furthermore, in all analysis of Gene Ontology or Enriched Pathways, there is not any related to red blood cell genes that could generate noise in the interpretation of our results.

      (6) The neutrophil/lymphocyte ratio is already skewed during SARS-CoV-2 infection - which could be the reason for higher readings in severe participants? - speculate?

      Effectively, the ratio in several cell types is changed during SARS-CoV-2 infection. However, despite this noise in the proportion of immune cells, different functions in our study are more represented in cells with less count as Natural Killer cells. The modules of co-expression analysis support the notion that despite the number of cells being in different proportions, a transcriptional program is being executed differentially in the cohorts.

      (7) CD247/ZAP70 also influences the CD16-mediated NK cell ADCC activity which the authors can add to the innate-adaptive bridging section.

      NK CD16a is more highly expressed in NK cells. The circuit involving CD247/ZAP70 and CD16 could explain the cytotoxicity of these cells and how they contribute to the establishment of a response to fight the viral infection of SARS-CoV-2. In our study, CD16a (FcgammaRIIIa) expression was similar in both mild and severe cohorts. Because our methodology only counts transcriptional changes, genes that did not change were excluded from our discussion. However, our group's research focuses on this node or bridge between innate and adaptive immune responses, with a particular emphasis on fc-antibodies functions, being a topic of interest for future research.

      (8) Some of the figures lacked clarity making it difficult to review. (Eg. Fig 4 A, Fig 4 - supplement 1 A&B, Fig 5).

      Figure 4A was redesigned, Figure 4-figure supplement 1 was presented in a full page for better resolution.

      Specific Comments:

      (1) Consider changing "covid-19" in the title of the manuscript to "COVID-19"

      Probably the journal platform changes the letters. The original title is in capital letters according to the observation. In the clinical table “COVID-19” was changed to capital letters.

      (2) Page 2: Line 24 - Consider revising this line. Not sure what the authors mean by 'early compromise'

      The paragraph was revised and rewritten.

      Original:

      “Mild COVID-19 patients presented an early compromise with NK cell function, whereas severe patients do so with neutrophil function”

      Changed to:

      ”Mild COVID-19 patients displayed an early transcriptional commitment with NK cell function, whereas severe patients do so with neutrophil function”

      (3) Page 4: Lines 57 & 58 - Verify the reference. The paper referenced was published in 2016 and is in regard to SARS-CoV, MERS-CoV, and enterovirus D68.

      Effectively, this reference was appropriate for drawing parallels with other respiratory viruses. Due to the emphasis on SARS-CoV-2, the paragraph has been strengthened with two additional references: Shen 2023, and Wauters 2022.

      (4) Page 10: Lines 229 - 234 - Consider referring to the appropriate figure (i.e., Figure Supplement 2 A or B). The figure associated with D28 DEGs (Volcano plot) is missing in the supplementary. Erroneously referred here as Figure 1C which is a PCA plot?

      The original text was changed because the figure referenced was correct but misunderstood. The final sentence is on page 9, lines 220-223.

      (5) Page 10: Line 224 - Change the sentence to " We found upregulated.." instead of " We found regulated..".

      The text was edited in accordance with this recommendation, which is currently found in line 232.

      (6) Page 13: Line 326 - Figure 4A referenced here is not clear - unable to review.

      Figure 4A was updated for a better resolution and included in the manuscript.

      (7) Page 15: Line 398 - Consider rewording "after diagnosis" since the days here are "after enrolment".

      This recommendation was considered and the text was rewritten on page 15, lines 404-406:

      Original:

      “We systematically analyzed transcriptomic features of PBMCs from COVID-19 patients with mild and severe symptoms at three sequential time-points (D0, D7, and D28) after diagnosis”

      Changed to:

      “We systematically analyzed transcriptomic features of PBMCs from COVID-19 patients with mild and severe symptoms at three sequential time-points (D0, D7, and D28) during the peak of the symptoms”

      (8) Page 17: Move text from the next page to eliminate blank space.

      Resolved

      (9) Page 32: Figure 1C - Consider changing the symbol for D28 since it looks very similar to the D0 symbol. Use the colors consistently instead of different shades for each group.

      The hexagon symbol was changed by a star symbol for D28 in figure 1C.  In this figure each color indicates the three different groups, and the transparent color was used to differentiate the symbols when are close together.

      (10) Page 36: Figure 4A - Unable to review.

      This figure was resized for better resolution.

      (11) Page 42-49: Consider relabeling and renumbering the Supplementary figures for consistency and reference the modified numbers in the appropriate location in the main text.

      The supplementary figures were relabeling for consistency and better understanding.

      (12) Pages 44 & 48: Unable to review the figures.

      The figures indicated were resized for better resolution.

      Examples of consistency review:

      (1) Use of D0,D7 / D-0, D-7 throughout the manuscript

      The selected format for the final version of the manuscript is D0, D7, and D28.

      (2) Reporting the source of reagents consistently (Name, Place, Country, Catalog number)

      The source reagents were reformatted for consistency in lines 626-628-632-642.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1: The authors may consider moving the supplemental figures into the main body of the paper since they finally would end up with a total of eight figures.

      As we added two more supplementary figures, we left them separated from the main part of the manuscript in the supplement. All of them describe important experimental details but we believe that it is easier to follow if there is a focus on the key results.

      Reviewer #1: In general, the methods and techniques used here are beside some required but important additions described in sufficient detail.

      Reviewer #2: Given the identified importance of glow-discharge treatment of precoated tape to the flat deposition of sections during ATUM, a corresponding schematic or appropriate reference(s) providing more information about the custom-built tape plasma device would likely be a prerequisite for effective reproduction of this technique in other laboratories.

      Thank you for the valuable comments on the missing experimental details, which could affect the ease of establishing ATUM-Tomo in other labs. We will clearly highlight the ATUM-Tomo-specific vs. some general EM processing steps of the workflow in the proposed way. A detailed description of the custom-built tape plasma device will be added to the methods section. In addition, we will reference more explicitly our published protocols, which describe the standard electron microscopy embedding steps in great detail (Kislinger et al., STAR protocols, 2020; Kislinger et al., Meth Cell Biol, 2023).

      Reviewer #1: Concerning the results section: In my opinion, the results section is a bit unbalanced. There is a mismatch between the detailed description of the methodology (experimental approach) and the scientific findings of the paper. The reviewer can see the enormous methodological impact of the paper, which on the other hand is the major drawback of the paper. To my opinion, the authors should also give a more detailed description of their scientific results.

      Concerning the discussion: It would have been nice to give a perspective to which the described methodology can be used not only to describe diverse biological aspects that can be addressed and answered by this experimental approach. For example, how could this method be used to address various questions about the normal and pathologically altered brain?

      In my opinion, the paper has one major drawback which is that it is more methodologically based although the authors included a scientific application of the method. The question here is to balance the methodology vs. the scientific achievement of this paper, a decision hard to take. In other words, one could recommend this paper to more methodologically based journals, for example, Nature Methods.

      Balancing the technological and biological parts is indeed a difficult issue. We agree that this manuscript mainly describes a technical advancement and demonstrates its power to answer previously unsolved scientific questions. We exemplify this in our model system, neuropathology of the blood-brain barrier. The biological impact of ATUM-SEM has been described in detail in Khalin et al., Small, 2022, and is referenced accordingly. Here we describe how ATUM-Tomo can be applied to reveal biological insights exceeding the capabilities of ATUM-SEM and other volume electron microscopy techniques. However, the description of the methodological development outweighs by far the one of the biological details. We consider eLife‘s Tools and Resources (which, in our view, is in scope similar to Nat Methods) an ideal format for this technically focused manuscript while targeting eLife’s readership with diverse biological fields of interest for potential applications of the method. We suggested the application in connectomics (for chemical synapses), the study of endocytosis and the detection of virus particles in the discussion. Hopefully, this accommodates the Reviewer’s concern that having only a single application might seem arbitrary or even suggest a very narrow utility of the technique.

      “While we demonstrate a neuropathology-related application, further biological targets that require high-resolution isotropic voxels and the spatial orientation within a larger ultrastructural context can potentially be studied by ATUM-Tomo. This includes the detection of gap junctions for connectomics or for the study of long-range projections (Holler et al., 2021) and the subcellular location of virus particles (Wu et al., 2022, Roingeard, 2008, Pelchen-Matthews and Marsh, 2007). Thus, ATUM-Tomo opens up new avenues for multimodal volume EM imaging of diverse biological research areas.”

      Reviewer #2: Is the separation of sections from permanent marker-treated tape sensitive to the time interval between deposition/SEM imaging and acetone treatment?

      Thank you for pointing out this important methodological aspect. We have not systematically investigated whether there is a critical time window between microtomy, SEM, and detachment. From the samples generated for this study, we assessed the importance of timing in retrospect:

      “The sections could be recovered even four months after collection and nine months after SEM imaging.”

      Reviewer #2: To what extent is slice detachment from permanent marker-treated tape resin-dependent [i.e. has ATUM-Tomo been tested on resin compositions beyond LX112 (LADD)]?

      We appreciate this comment addressing the broader technical applicability of ATUM-Tomo. We tested the general workflow with tissue embedded in other commonly used resin types (epon, durcupan).

      Reviewer #2: Minor corrections to the text and figures.

      Line 83: ((Khalin et al., 2022) should read (Khalin et al., 2022)

      Line 86 : 30nm should read 30 nm

      Line 139: "...morphological normal tight junctions..." should read "...morphologically normal tight junctions..."

      Line 283: "....despite glutaraldehyde fixation, a prerequisite for optimal ultrastructural preservation...".

      Line 295: "In contrast, our CLEM approach provides high ultrastructural quality by optimal chemical fixation".

      The concepts of optimal preservation and optimal fixation are arguably context- and application-dependent. These statements should be toned down or their context explicitly stated.

      Thank you for the detailed corrections. We have applied them accordingly.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Faniyan and colleagues build on their recent finding that renal Glut2 knockout mice display normal fasting blood glucose levels despite massive glucosuria. Renal Glut2 knockout mice were found to exhibit increased endogenous glucose production along with decreased hepatic metabolites associated with glucose metabolism. Crh mRNA levels were higher in the hypothalamus while circulating ACTH and corticosterone was elevated in this model. While these mice were able to maintain normal fasting glucose levels, ablating afferent renal signals to the brain caused low fasting blood glucose levels. In addition, the higher CRH and higher corticosterone levels of the knockout mice were lost following this denervation. Finally, acute phase proteins were altered, plasma Gpx3 was lower, and major urinary protein MUP18 and its gene expression were higher in renal Glut2 knockout mice. Overall, the main conclusion that afferent signaling from the kidney is required for renal glut2 dependent increases in endogenous glucose production is well supported by these findings.

      Strengths:

      An important strength of the paper is the novelty of the identification of kidney to brain communication as being important for glucose homeostasis. Previous studies had focused on other functions of the kidney modulated by or modulating brain function. This work is likely to promote interest in CNS pathways that respond to afferent renal signals and the response of the HPA axis to glucosuria. Additional strengths of this paper stem from the use of incisive techniques. Specifically, the authors use isotope enabled measurement of endogenous glucose production by GC-MS/MS, capsaicin ablation of afferent renal nerves, and multifiber recording from the renal nerve. The authors also paid excellent attention to rigor in the design and performance of these studies. For example, they used appropriate surgical controls, confirmed denervation through renal pelvic CGRP measurement, and avoided the confounding effects of nerve regrowth over time. These factors strengthen confidence in their results. Finally, humans with glucose transporter mutations and those being treated with SGLT2 inhibitors show a compensatory increase in endogenous glucose production. Therefore, this study strengthens the case for using renal Glut2 knockout mice as a model for understanding the physiology of these patients.

      Weaknesses:

      A few weaknesses exist. Most concerns relate to the interpretation of this study's findings. The authors state that loss of glucose in urine is sensed as a biological threat based on the HPA axis activation seen in this mouse model. This interpretation is understandable but speculative. Importantly, whether stress hormones mediate the increase in endogenous glucose production in this model and in humans with altered glucose transporter function remains to be demonstrated conclusively. For example, the paper found several other circulating and local factors that could be causal. This model is also unable to shed light on how elevated stress hormones might interact with insulin resistance, which is known to increase endogenous glucose production. That issue is of substantial clinical relevance for patients with T2D and metabolic disease. Finally, how these findings can contribute to improving the efficiency of drugs like SGLT2 inhibitors remains to be seen.

      -  We agree with the reviewer’s overall assessment of this manuscript.

      - Confirming the contribution of each secreted protein shown in Fig. 4, whose levels were changed between the two groups of mice, toward causing a compensatory increase in glucose production in response to elevated glycosuria is beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors previously generated renal Glut2 knockout mice, which have high levels of glycosuria but normal fasting glucose. They use this as an opportunity to investigate how compensatory mechanisms are engaged in response to glycosuria. They show that renal and hepatic glucose production, but not metabolism, is elevated in renal Glut2 male mice. They show that renal Glut2 male mice have elevated Crh mRNA in the hypothalamus, and elevated plasma levels of ACTH and corticosterone. They also show that temporary denervation of renal nerves leads to a decrease in fasting and fed blood glucose levels in female renal Glut2 mice, but not control mice. Finally, they perform plasma proteomics in male mice to identify plasma proteins that are changed (up or down) between the knockouts and controls.

      Strengths:

      The question that is trying to be addressed is clinically important: enhancing glycosuria is a current treatment for diabetes, but is limited in efficacy because of compensatory increases in glucose production.

      Weaknesses:

      (1) Although I appreciate that the initial characterization of the mice in another publication showed that both males and females have glycosuria, this does not mean that both sexes have the same mechanisms giving rise to glycosuria. There are many examples of sex differences in HPA activation in response to threat, for example. There is an unfounded assumption here that males and females have the same underlying mechanisms of glycosuria that undermines the significance of the findings.

      - We agree with the reviewer that although we didn’t observe sex differences in renal Glut2 KO mice in the context of glucose homeostasis, their response (or mechanisms) to elevated glycosuria in enhancing compensatory glucose production may be different between the sexes. Therefore, we have included this limitation in discussion section.

      (2) The authors state that they induced the Glut2 knockout with taxomifen as in their previous publication. The methods of that publication indicate that all experiments were completed within 14 days of inducing the Glut2 knockout. This means that the last dose of tamoxifen was delivered 14 days prior to the experimental endpoint of each experiment. This seems like an important experimental constraint that should be discussed in this manuscript. Is the glycosuria that follows Glut2 knockout only a temporary change? If so, then the long-term change in glycosuria that follows SGLT2 inhibition in humans might not be best modelled by this knockout. Please specify when the surgeries to implant a jugular catheter or ablate the renal nerves performed relative to the Glut2 knockout in the Methods.

      - The reviewer’s statement ‘The methods of that publication indicate that all experiments were completed within 14 days of inducing the Glut2 knockout’ is incorrect. In the referred publication, we had explicitly mentioned in methods, ‘All of the experiments, except those using a diet-induced obesity mouse model or noted otherwise, were completed within 14 days of inducing the Glut2 deficiency.’ Please see figures 5h-l and 6 in the cited publication, which demonstrate that all the experiments were not completed within 14 days of inducing renal Glut2 deficiency. Per the reviewer’s advice, in the present manuscript we have include the timeline (which in some cases is 4 months beyond inducing glycosuria) in all the figure legends. In addition, for a separate project (which is unpublished) we have measured glycosuria up to 1 year after inducing renal Glut2 deficiency. Therefore, the glycosuria observed in the renal Glut2 KO mice is not temporary.

      (3) I am still unclear what group was used for controls. Are these wild-type mice who receive tamoxifen? Are they KspCadCreERT2;Glut2loxP/loxP mice who do not receive tamoxifen? This is important and needs to be specified.

      - In our previous response to the reviewer, we had already mentioned which control group was used in this study. Please see our response to the second reviewer’s point 3. As mentioned to the reviewer, we had used Glut2loxp/loxp mice as the control group, which is also described multiple times in the figure legends of our previous paper that reported the phenotype of renal Glut2 KO mice. Per the reviewer’s advice, we have provided the information again in a revised version of this manuscript.

      (4) The authors should report some additional control measures for the renal denervation that could also impact blood glucose and perhaps some of their other measures. The control measures, which one would like to see unimpacted by renal denervation, include body weights, food consumption and water intake, and glycosuria itself.

      - Please also see fig. 3 in the present manuscript that demonstrates renal afferent denervation doesn’t influence baseline blood glucose or plasma insulin levels. We have now also mentioned in the text that the denervation doesn’t affect food intake or bodyweight.

      (5) The graphical abstract shows a link between the hypothalamus and the liver that is completely unsupported by any of the current findings. That arrow should be removed.

      - Because we observed an increase in hepatic glucose production in renal Glut2 KO mice (Fig. 1) - which was reduced by 50% after selective afferent renal denervation (Fig. 3) - in the graphical abstract we are suggesting a neural connection between the kidney-brain-liver or an endocrine factor(s) to account for these changes in blood glucose levels as also described in the discussion section. We can include a question mark ‘?’ in the graphical abstract to show that further studies are need to validate these proposed mechanisms; however, we cannot just remove the arrow as advised by the reviewer.

      (6) Though the authors have toned down their language implying a causal link between the HPA measures and compensatory elevation of blood glucose in the face of glycosuria, the title still implies this causal link. It is still the case that their data do not support causation. There are many potential ways to establish a causal link but those experiments are not performed here. The renal afferents are correlated with Crh content of the PVN, but nothing has been done to show that the Crh content is important for elevating blood glucose. In light of this, the title should be toned down. Perhaps something like "Renal nerves maintain blood glucose production and elevated HPA activity in response to glycosuria". The link between HPA and glucose is not shown in this paper.

      - We request the reviewer to take a look at figure 1, showing an increase in glucose production in renal Glut2 KO mice and figure 3, which demonstrates that an afferent renal denervation reduces blood glucose levels by 50%. The afferent renal denervation (ablation of afferent renal nerves) does reduce blood glucose levels in renal Glut2 KO mice. Therefore, the use of the word ‘promote’ in the title is accurate and appropriate to reflect the role of the afferent renal nerves in contributing to about 50% increase in blood glucose levels in renal Glut2 KO mice.

      - Regarding the reviewer's comment on changes in Crh gene expression, please look at figure 3. Ablation of renal afferent nerves decreases hypothalamic Crh gene expression and other mediators of the HPA axis by 50%. Therefore, the afferent renal nerves do contribute to regulating blood glucose levels, at least in part, by the HPA axis (which is widely known to change blood glucose levels). The use of words such as ‘required’ or ‘necessary’ in the title may have indicated causal role or could have been misleading here; therefore, we have purposely used ‘promote’ in the title to accurately reflect the findings of this study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have only minor text corrections to add:

      - line 223 "A list"

      - line 253 "independent"

      - line 271 "the body's"

      - line 304 "do not"

      Yes, we have corrected these errors in a revised version of this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please report the dilutions used, if any, for the ELISAs. If the samples were run neat, please report this. Many manufacturer's instructions say that the user must determine the correct dilution to use for the samples collected. Also, sometimes when small blood volumes are collected, samples must be diluted to achieve the minimum volume collected for the assay. It is not sufficient to indicate that a reader refers to the manufacturer's instructions.

      - Per the reviewer’s advice, we have included the dilutions used for each assay in the methods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Point 1: The authors have demonstrated that Cs9g12620 contains the EBE of PthA4 in the promoter region, to show that PthA4 controls Cs9g12620, the authors need to compare to the wild type Xcc and pthA4 mutant for Cs9g12620 expression. The data in Figure 1 is not enough.

      The data in Figure 1 D and E show a pthA4 Tn5 insertion mutant Mxac126-80 and the expression level of Cs9g12620 in citrus inoculated with the pthA4 mutant.

      Point 2: The authors confirmed the interaction between PthA4 and the EBE in the promoter of Cs9g12620 using DNA electrophoretic mobility shift assay (EMSA). However, Figure 2B is not convincing. The lane without GST-PthA4 also clearly showed a mobility shift. For the EMSA assay, the authors need also to include a non-labeled probe as a competitor to verify the specificity. The description of the EMSA in this paper suggests that it was not done properly. It is suggested the authors redo this EMSA assay following the protocol: Electrophoretic mobility shift assay (EMSA) for detecting protein-nucleic acid interactions PMID: 17703195.

      Thank you very much for your comments. We have re-conducted the EMSA analysis based on your suggestion. The DNA probe was labeled with Cy5, included a non-labeled probe as a competitor. (Figure 3 B and D; Figure 4B and E)

      Point 3: The authors also claimed that PthA4 suppresses the promote activity of Cs9g12620. The data is not convincing and also contradicts with their own data that overexpression of Cs9g12620 causes canker and silencing of it reduces canker considering PthA4 is required for canker development. The authors conducted the assays using transient expression of PthA4. It should be done with Xcc wild type, pthA4 mutant, and negative control to inoculate citrus plants to check the expression of Cs9g12620.

      We have detected Cs9g12620 expression in silencing citrus plants inoculated wild type Xcc 29-1. (Figure 7F)

      Point 4: Figure 6 AB is not convincing. There are no apparent differences. The variations shown in B are common in different wild-type samples. It is suggested that the authors conduct transgenic instead of transient overexpression.

      It has been proven that transient expression of PthA4 leads to canker-like phenotype, suggesting that this experiment is effective. However, it will be more confident if conduct transgenic plant overexpressing pthA4 and Cs9g12620. We’ll create the plants in our following research to confirm the phenotype.

      Point 5: Gene silencing data needs more appropriate controls. Figure D seems to suggest canker symptoms actually happen for the RNAi treated. The authors need to make sure the same amount of Xcc was used for both CTV empty vector and the RNAi. It is suggested a blink test is needed here.

      We used the same amount of Xcc to inoculate CTV empty vector and the RNAi. In either inoculation, the cultured Xcc cells were suspended in sterile distilled water to a final concentration of 108 CFU/mL (OD600 = 0.3).

      Point 6: Figure 1. Please draw a figure to clearly show the location of the EBE in the promoter of Cs9g12620, including the transcription start site, and translational start site.

      The EBE in Cs9g12620 promoter was indicated by underlined in Figure supplement 1. We did not sure about the translation start site, but the translation start site was labelled.

    1. Author response:

      Reviewer #1 (Public Review):

      Areas of improvement and suggestions:

      (1) "These results suggest the SP targets interneurons in the brain that feed into higher processing centers from different entry points likely representing different sensory input" and "All together, these data suggest that the abdominal ganglion harbors several distinct type of neurons involved in directing PMRs"

      The characterization of the post-mating circuitry has been largely described by the group of Barry Dickson and other labs. I suggest ruling out a potential effect of mSP in any of the well-known post-mating neuronal circuitry, i.e: SPSN, SAG, pC1, vpoDN or OviDNs neurons. A combination of available split-Gal4 should be sufficient to prove this.

      Indeed, we have tested drivers for some of these neurons already and agree that this information is important to distinguish neurons which are direct SP target from neurons which are involved in directing reproductive behaviors.

      (2) Authors must show how specific is their "head" (elav/otd-flp) and "trunk" (elav/tsh) expression of mSP by showing images of the same constructs driving GFP.

      The expression pattern for tshGAL, which expresses in the trunk is already published (Soller et al., 2006). We will add images for “head” expression.

      (3) VT3280 is termed as a SAG driver. However, VT3280 is a SPSN specific driver (Feng et al., 2014; Jang et al., 2017; Scheunemann et al., 2019; Laturney et al., 2023). The authors should clarify this.

      According to the reviewers suggestion, we will clarify the specificity of VT3280.

      (4) Intersectional approaches must rule out the influence of SP on sex-peptide sensing neurons (SPSN) in the ovary by combining their constructs with SPSN-Gal80 construct. In line with this, most of their lines targets the SAG circuit (4I, J and K). Again, here they need to rule out the involvement of SPSN in their receptivity/egg laying phenotypes. Especially because "In the female genital tract, these split-Gal4 combinations show expression in genital tract neurons with innervations running along oviduct and uterine walls (Figures S3A-S3E)".

      We agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.

      In principal, use of GAL80 is a valid approach to restrict expression, if levels of GAL80 are higher than those of GAL4, because GAL80 binds GAL4 to inhibit its activity. Hence, if levels of GAL80 are lower, results could be difficult to interpret.

      (5) The authors separate head (brain) from trunk (VNC) responses, but they don't narrow down the neural circuits involved on each response. A detailed characterization of the involved circuits especially in the case of the VNC is needed to (a) show that the intersectional approach is indeed labelling distinct subtypes and (b) how these distinct neurons influence oviposition.

      Again, we agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.

      Reviewer #2 (Public Review):

      Strength:

      The intersectional approach is appropriate and state-of-the art. The analysis is a very comprehensive tour-de-force and experiments are carefully performed to a high standard. The authors also produced a useful new transgenic line (UAS-FRTstopFRT mSP). The finding that neurons in the brain (head) mediate the SP effect on receptivity, while neurons in the abdomen and thorax (ventral nerve cord or peripheral neurons) mediate the SP effect on oviposition, is a significant step forward in the endavour to identify the underlying neuronal networks and hence a mechanistic understanding of SP action. Though this result is not entirely unexpected, it is novel as it was not shown before.

      We thank reviewer 2 for recognizing the advance of our work.

      Weakness:

      Though the analysis identifies a small set of neurons underlying SP responses, it does not go the last step to individually identify at least a few of them. The last paragraph in the discussion rightfully speculates about the neurochemical identity of some of the intersection neurons (e.g. dopaminergic P1 neurons, NPF neurons). At least these suggested identities could have been confirmed by straight-forward immunostainings agains NPF or TH, for which antisera are available. Moreover, specific GAL4 lines for NPF or P1 or at least TH neurons are available which could be used to express mSP to test whether SP activation of those neurons is sufficient to trigger the SP effect.

      We appreciate this reviewers recognition of our previous work showing that receptivity and oviposition are separable. As pointed out we have now gone one step further and identified in a tour de force approach subsets of neurons in the brain and VNC.

      We agree with this reviewer that we need a higher resolution of expression to only one cell type. As pointed out by this reviewer, the neurochemical identity is an excellent suggestions and will help to further restrict expression to just one type of neuron. However, this is a major task that we will continue in follow up studies.

      Reviewer #3 (Public Review):

      Strengths:

      Besides the main results described in the summary above, the authors discovered the following:

      (1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.

      (2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.

      We thank reviewer 3 for recognizing these two important points regarding the SP response that point to a revised model for how the underlying circuitry induces the post-mating response.

      Weaknesses:

      (1) Intersectional expression involving ppk-GAL4-DBD was negative in all GAL4AD lines (Supp. Fig.S5). As the authors mentioned, ppk neurons may not intersect with SPR, fru, dsx, and FD6 neurons in inducing PMRs by mSP. However, since there was no PMR induction and no GAL4 expression at all in any combination with GAL4-AD lines used in this study, I would like to have a positive control, where intersectional expression of mSP in ppk-GAL4-DBD and other GAL4-AD lines (e.g., ppk-GAL4-AD) would induce PMR.

      We will add positive controls of for ppk-DBD expression and expand the discussion section.

      (2) The results of SPR RNAi knock-down experiments are inconclusive (Figure 5). SPR RNAi cancelled the PMR in dsx ∩ fru11/12 and partially in SPR8 ∩ fru 11/12 neurons. SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive; it is unclear whether SPR mediates the phenotype in SPR8 ∩ fru 11/12 and dsx ∩ SPR8 neurons.

      We agree with this reviewer that the interpretation of the SPR RNAi results are complicated by the fact that SP has additional receptors (Haussmann et al 2013). The results are conclusive for all three intersections when expressing UAS mSP in SPR RNAi with respect to oviposition, e.g. egg laying is not induced in the absence of SPR. For receptivity, the results are conclusive for dsx ∩ fru11/12 and partially for SPR8 ∩ fru 11/12.

      Potentially, SPR RNAi knock-down does not sufficiently reduce SPR levels to completely reduce receptivity in some intersection patterns, likely also because splitGal4 expression is less efficient.

      Why SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive is unclear, but we anticipate that we need a higher resolution of expression to only one cell type to resolve this unexpected result. However, this is a major task that we will continue in follow up studies.

      SPR RNAi knock-down experiments may also help clarify whether mSP worked autocrine or juxtacrine to induce PMR. mSP may produce juxtacrine signaling, which is cell non-autonomous.

      Whether membrane-tethered SP induces the response in a autocrine manner is an import aspect in the interpretation of the results from mSP expression.

      Removing SPR by SPR RNAi and expression of mSP in the same neurons did not induce egg laying for all three intersection and did not reduce receptivity for dsx ∩ fru11/12 and for SPR8 ∩ fru 11/12. Accordingly, we can conclude that for these neurons the response is induced in an autocrine manner.

      We will add this aspect to the discussion section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This paper by Beath et. al. identifies a potential regulatory role for proteins involved in cytoplasmic streaming and maintaining the grouping of paternal organelles: holding sperm contents in the fertilized embryos away from the oocyte meiotic spindle so that they don't get ejected into the polar body during meiotic chromosome segregation. The authors show that by time-lapse video, paternal mitochondria (used as a readout for sperm and its genome) is excluded from yolk granules and maternal mitochondria, even when moving long distances by cytoplasmic streaming. To understand how this exclusion is accomplished, they first show that it is independent of both internal packing and the engulfment of the paternal chromosomes by maternal endoplasmic reticulum creating an impermeable barrier. They then test whether the control of cytoplasmic steaming affects this exclusion by knocking down two microtubule motors, Katanin and kinesis I. They find that the ER ring, which is used as a proxy for paternal chromosomes, undergoes extensive displacement with these treatments during anaphase I and interacts with the meiotic spindle, supporting their hypothesis that the exclusion of paternal chromosomes is regulated by cytoplasmic streaming. Next, they test whether a regulator of maternal ER organization, ATX-2, disrupts sperm organization so that they can combine the double depletion of ATX-2 and KLP-7, presumably because klp-7 RNAi (unlike mei-1 RNAi) does not affect polar body extrusion and they can report on what happens to paternal chromosomes. They find that the knockdown of both ATX-2 and KLP-7 produces a higher incidence of what appears to be the capture of paternal chromosomes by the meiotic spindle (5/24 vs 1/25). However, this capture event appears to halt the cell cycle, preventing the authors from directly observing whether this would result in the paternal chromosomes being ejected into the polar body. 

      Strengths: 

      This is a useful, descriptive paper that highlights a potential challenge for embryos during fertilization: when fertilization results in the resumption of meiotic divisions, how are the paternal and maternal genomes kept apart so that the maternal genome can undergo chromosome segregation and polar body extrusion without endangering the paternal genome? In general, the experiments are well-executed and analyzed. In particular, the authors' use of multiple ways to knock down ATX-2 shows rigor. 

      Weaknesses: 

      The paper makes a case that this regulation may be important but the authors should do some additional work to make this case more convincing and accessible for those outside the field. In particular, some of the figures could include greater detail to support their conclusions, they could explain the rationale for some experiments better and they could perform some additional control experiments with their double depletion experiments to better support their interpretations. Also, the authors' inability to assess the functional biological consequences of the capture of the sperm genome by the oocyte spindle should be discussed, particularly in light of the cell cycle arrest that they observe. 

      These general comments are addressed in the more specific critiques below.

      Reviewer #2 (Public Review): 

      Summary 

      In this manuscript, Beath et al. use primarily C. elegans zygotes to test the overarching hypothesis that cytoplasmic mechanisms exit to prevent interaction between paternal chromosomes and the meiotic spindle, which are present in a shared zygotic cytoplasm after fertilization. Previous work, much of which by this group, had characterized cytoplasmic streaming in the zygote and the behavior of paternal components shortly after fertilization, primarily the clustering of paternal mitochondria and membranous organelles around the paternal chromosomes. This work set out to identify the molecular mechanisms responsible for that clustering and test the specific hypothesis that the "paternal cloud" helps prevent the association of paternal chromosomes with the meiotic spindle. 

      Strengths 

      This work is a collection of technical achievements. The data are primarily 3- and 4-channel time-lapse images of zygotes shortly after fertilization, which were performed inside intact animals. There are many instances in which the experiments show extreme technical skill, such as tracking the paternal chromosomes over large displacements throughout the volume of the embryo. The authors employ a wide variety of fluorescent reporters to provide a remarkably clear picture of what is going on in the zygote. These reagents and the novel characterization of these stages that they provide will be widely beneficial to the community. 

      The data provide direct visualization of what had previously been a mostly hypothetical structure, the "paternal cloud," using simultaneous labeling of paternal DNA and mitochondria in combination with a variety of maternal proteins including maternal mitochondria, yolk granules, tubulin, and plasma membrane. Together, these images provided convincing evidence of the existence of this specified cytoplasmic domain. They go on to show that the knockdown of the ataxin-2 homolog ALX-2, a protein previously shown to affect ER dynamics, disrupted the paternal cloud, identifying a role for ER organization in this structure. 

      The authors then used the system to test the functional consequences of perturbing the cytoplasmic organization. Consistent with the paternal cloud being a stable structure, it stayed intact during large movements the authors generated using previously published knockdowns (of mei-1/katanin and kinesin-13/kpl-7) that increased cytoplasmic streaming. They used this data to document instances in which the paternal chromosomes were likely to have been attached to the spindle. They concluded with direct evidence of spindle fibers connecting to the paternal chromatin upon knockdown of ATX-2 in combination with increased cytoplasmic streaming, providing strong, direct support for their overarching hypothesis. 

      Weaknesses 

      While the data is convincing, the narrative of the paper could be streamlined to highlight the novelty of the experiments and better articulate the aims. For example, the cloud of paternal mitochondria and membranous organelles was previously shown, but Figures 1-2 largely reiterate that observation. The innovation seems to be that the combination of ER, yolk, and maternal mitochondrial markers makes the existence of a specified domain more concrete. There are also some instances where more description is needed to make the conclusions from the images clear. 

      These general comments are addressed in the more specific critiques below.

      The manuscript intersperses what read like basic characterizations of fluorescent markers that, as written, can distract from the main story. The authors characterized the dynamics of ER organization throughout the substages of meiosis and the permeability of the envelope of ER that surrounds the paternal chromatin, but it could be more clearly established how the ability to visualize these structures allowed them to address their aims.

      We have added the following after the initial description of ER morphology changes: (ER morphology was used to determine cell-cycle stages during live imaging reported below in Fig. 6.)

      More background on what was previously known about ER organization in M-phase and the role of ataxin proteins specifically may help provide more continuity. 

      We have added references to transitions to ER sheets during mitotic M-phase in HeLa cells and Xenopus extracts.

      Reviewer #3 (Public Review): 

      Summary: 

      This study by Beath et al. investigated the mechanisms by which sperm DNA is excluded from the meiotic spindle after fertilization. Time-lapse imaging revealed that sperm DNA is surrounded by paternal mitochondria and maternal ER that is permeable to proteins. By increasing cytoplasmic streaming using kinesin-13 or katanin RNAi, the authors demonstrated that limiting cytoplasmic streaming in the embryo is an important step that prevents the capture of sperm DNA by the oocyte meiotic spindle. Further experiments showed that the Ataxin-2 protein is required to hold paternal mitochondria together and close to the sperm DNA. Finally, double depletion of kinesin-13 and Ataxin-2 suggested an increased risk of meiotic spindle capture of sperm DNA. 

      Overall, this is an interesting finding that could provide a new understanding of how meiotic spindle capture of sperm DNA and its accidental expulsion into the polar body is prevented. However, some conceptual gaps need to be addressed and further experiments and improved data analyses would strengthen the paper. 

      - It would be helpful if the authors could discuss in good detail how they think maternal ER surrounds the sperm DNA

      We have added 2 references to papers about nuclear envelope re-assembly from Shirin Bahmanyar’s lab and suggest the ER envelope is a halted intermediate in nuclear envelope reassembly.

      and why is it not disrupted following Ataxin disruption. 

      We have been attempting to disrupt ER structures in the meiotic embryo for the last 5 years by depleting profilin, BiP, atlastin, ATX-2 and by optogenetically packing ER into a ball in the middle of the oocyte.  None of these treatments prevent envelopment of the sperm DNA by maternal ER.  None of these treatments remove ER from the spindle envelope and none remove ER from the plasma membrane.  These treatments mostly result in “large aggregates” of ER that we have not examined by EM.  Wild speculation: any disruption of the ER strong enough to prevent ER envelopment around chromatin would be sterile because the M to S transition in the mitotic zone of the germline would be blocked.  Rapid depletion of ATX-2 to the extent shown by rigorous data in this manuscript does not prevent ER envelopment around chromatin.  We chose not to speculate about the reasons for this because we do not know why.

      - Since important phenotypes revealed in RNAi experiments (e.g. kinesin-13 and ataxin-2 double depletion) are not very robust, the authors should consider toning down their conclusions and revising some of their section headings. I appreciate that they are upfront about some limitations, but they do nonetheless make strong concluding sentences. 

      We have changed the discussion of the klp-7 atx-2 double depletion to: “The capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos suggests that the integrity of the exclusion zone around the sperm DNA might insulate the sperm DNA from spindle microtubules.  However, a much larger number of klp-7(RNAi) singly depleted and atx-2(degron) singly depleted time-lapse sequences are needed to rigorously support this idea. “

      - The discussion section could be improved further to present the authors' findings in the larger context of current knowledge in the field. 

      We have expanded the discussion as suggested.

      - The authors previously demonstrated that F-actin prevents meiotic spindle capture of sperm DNA in this system. However, the current manuscript does not discuss how the katanin, kinesin-13 and Ataxin-2 mechanisms could work together with previously established functions of F-actin in this process. 

      We have added pfn-1(RNAi) to the discussion section.

      - How can the authors exclude off-target effects in their RNAi depletion experiments? Can kinesin-13, katanin, and Ataxin phenotypes be rescued for instance? 

      For ataxin-2 phenotypes, two completely independent controls for off target effects are shown.  GFP(RNAi) on a strain with and endogenous ATX-2::GFP tag vs GFP(RNAi) on a strain with no tag on the ATX-2.  ATX-2::AID with or without auxin.  For kinesin-13 and katanin, we did not do a rigorous control for off-target effects of RNAi.  However, the effects of these depletions on cytoplasmic microtubules have been previously reported by others

      - How are the authors able to determine if the paternal genome was actually captured by the spindle? Does lack of movement definitively suggest capture without using a spindle marker? 

      mKate::tubulin labels the spindle in each capture event.  This can be seen in Video S3. for mei-1(RNAi) and Figure 9 for atx-2 klp-7 double depletions.

      (1) Major issues: 

      The images provided are not convincing that mitochondria are entirely excluded from the regions with yolk granules from the images provided. Please provide insets of magnified images of the paternal mitochondria in Figure 1E to more clearly show the exclusion even when paternal mitochondria are streaming. Providing grayscale images, individual z-sections and/or some quantification of this data might also be more convincing to this reviewer. 

      We have modified Fig. 1 by adding single wavelength magnified insets to more clearly show that paternal mitochondria are in a “black hole” in the maternal yolk granules during  cytoplasmic streaming.

      Figure 2 -This figure can be retitled to highlight that the paternal organelle cloud is impermeable to mitochondria and conserved. 

      The legend has been re-titled as suggested.

      Figure 3B, An image of the DNA within the ring of maternal ER especially since the maternal ER ring is used as a proxy for the paternal chromosomes in later figures would strengthen the authors' claims.

      We have added a panel showing DAPI-stained DNA in the center of the ER ring and paternal mitochondria cloud. 

      Why is the faster time scale imaging significant? I think this could be more clearly set up in the paper. Perhaps rapid imaging of maternal mito-labeled kca-1(RNAi) embryos would better show the difference in time scale, with the expectation that the paternal cloud forms and persists while the ER invades. 

      We are not sure what the reviewer means.  5 sec time intervals were used throughout the paper.  We are also not sure how kca-1(RNAi) would help.  Movement of the entire oocyte into and out of the spermatheca is what limits the ability to keep a fusing sperm in focus.  kca-1(RNAi) would prevent cytoplasmic streaming but not ovulation movements.

      Figure 4 - The question about the permeability of the ER envelope seems to come out of nowhere as written. It isn't clear how it contributes to the larger story about preventing sperm incorporation in the spindle.

      This section of the results is introduced with: “If the maternal ER envelope around sperm DNA was sealed and impermeable during meiosis, this could both prevent the sperm DNA from inducing ectopic spindle assembly and prevent the sperm DNA from interacting with meiotic spindle microtubules.” 

      The data in Figure 4 would probably not be expected to be in this paper based on the paper title. Maybe the title needs something about ER dynamics? "eg. ATX-2 but not an ER envelope" isolates the paternal chromatin? 

      In Figure 5, it seems that RNAi of klp-7 and Mei-1 had slightly different effects on short-axis displacement of the ER envelope (klp-7 affecting it more dramatically than mei-1) and slightly different effects on interaction with the meiotic spindle (capture vs streaming past the spindle). The authors mention in their discussion that the difference in the interaction with the meiotic spindle might reflect the effects that loss of Mei-1 may have on the spindle but could it also be a consequence of the differences in cytoplasmic streaming observed?

      With our current data, the only statistically significant difference between cytoplasmic streaming of the sperm contents in mei-1(RNAi) vs klp-7(RNAi) is that excessive streaming persists longer into metaphase II in klp-7(RNAi).  We have added a sentence describing this difference to the results.  If differences in streaming were the cause of different capture frequencies, then klp-7(RNAi) would cause more capture events than mei-1(RNAi) but the opposite was observed.  We have avoided too much discussion here because the frequency of capture events is too low to demonstrate statistically significant differences between mei-1(RNAi), klp-7(RNAi), and atx-2(degron) + klp-7(RNAi) without a very large increase in the number of time-lapse sequences.  

      Also, the authors should find a way to represent this interaction with the meiotic spindle in a quantitative or table form to allow the reader to observe some of the patterns they report more easily.

      We have added a table to Fig. 9 that summarizes capture data.

      Finally, can the authors report when they observe the closest association with the meiotic spindle: Does it correlate with the period of greatest displacement (AI) or are they unlinked? 

      The low frequency of capture events makes it difficult to test this rigorously.

      Figure 6- 'Endogenously tagged ATX-2 was observed throughout oocytes and meiotic embryos without partial co-localization with ER.' How can the authors exclude co-localization with ER? 

      We have changed the wording to: “Endogenously tagged ATX-2 was observed throughout oocytes and meiotic embryos (Fig. 6A; Fig. S2).  ATX-2 did not uniquely  co-localize with ER (Fig. S2).“

      The rationale for why the authors think that the integrity of sperm organelles is important to keep the genomes apart is not clear to this reviewer and needs to be explained better. Moving the discussion of the displacement experiments in Figure S3 from the end of the results section to the ATX-2 knockdown section would help accomplish this. 

      We have added the sentence: “The frequency of sperm capture by the meiotic spindle (Fig. 9D) was significantly higher than wild-type controls in klp-7(RNAi) atx-2(AID) double depleted embryos (p=0.011 Fisher’s exact test).   Although the number of single mutant embryos analyzed was too low to demonstrate a significant difference between single and double mutant embryos,  these results qualitatively support the hypothesis that limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria are both important for preventing capture events between the meiotic spindle and sperm DNA.”

      It looks like, in the double knockdown of ATX-2 and KLP-7, the spread of paternal mitochondria is less affected than when only ATX-2 is depleted. What effect does this result have on the observation that the incidence of sperm capture appears to increase in the double depletion? What does displacement of the ER ring look like in the double depletion? Is it additive, consistent with their interpretation that both limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria is required to keep the genomes separate? 

      We cannot show a significant difference between single a double knockdowns without increasing n by alot.  We did not analyze ER ring displacement in the double mutant.

      Is the increased incidence of capture in the double-depleted embryos significant? 

      We have added the sentence: “The frequency of sperm capture by the meiotic spindle (Fig. 9D) was significantly higher than wild-type controls in klp-7(RNAi) atx-2(AID) double depleted embryos (p=0.011 Fisher’s exact test).   Although the number of single mutant embryos analyzed was too low to demonstrate a significant difference between single and double mutant embryos,  these results qualitatively support the hypothesis that limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria are both important for preventing capture events between the meiotic spindle and sperm DNA.”

      What do the authors make of the cell cycle arrest observed when paternal chromosomes are captured? Is there an argument to be made that this arrest supports the idea that preventing this capture is actively regulated and therefore functionally important? 

      We chose not to discuss the mechanism of this arrest because considerably more work would be required to prove that it is not caused by a combination of imaging conditions and genotype.  The low frequency of these capture + arrest events would make it very difficult to show that the arrest does not occur after depleting a checkpoint protein.

      (2) Minor concerns: 

      Top of page 4: "streaming because depletion tubulin stops cytoplasmic streaming (7)" should be "streaming because depletion of tubulin stops cytoplasmic streaming (7)" 

      The ”of” has been inserted.

      Page 6: "This result indicated that the volume of paternal mitochondria excludes maternal mitochondria and yolk granules but not maternal ER." The authors have only shown this for maternal mitochondria, not yolk granules. 

      We have deleted the mention of yolk granules here.

      Page 7: "These results suggest that all maternal membranes are initially excluded from the sperm at fusion." Should be "These results show that maternal ER are initially excluded from the sperm at fusion. Since maternal mitochondria and yolk granules are excluded later, this suggests that all maternal membranes are initially excluded from the sperm at fusion." 

      We have changed this sentence as suggested.

      It's not clear why the authors show other types of movement that might be quantified when cytoplasmic streaming is affected in Figure 5A and only quantify long-axis and short-axis displacement. 

      We have deleted the other types of movement from the schematic.  Although these parameters were quantified, we did not include this data in the results so it would be confusing for the reader to have them in the schematic.

      Bottom of page 7: Mention that the GFP::BAF-1 was maternally provided. 

      We have added “Maternally provided..”

      Missing an Arrow on Figure 1A 9:20. 

      We removed the text citation to an arrow in Fig. 1A because we moved most of the description of the ER ring to Fig. 3 to address other reviewer suggestions.

      Supplemental videos should be labeled appropriately to indicate what structures are labeled. It is currently difficult to understand what is being shown. 

      (3) Issues with the Discussion section: 

      "The simplest explanation is that cytoplasm does not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm." - Citation page 12. 

      We have changed the sentence to: “The simplest hypothesis is that maternal and paternal cytoplasm might not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm.” 

      "The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubule" - Pages 12-13 reference the figures. 

      This sentence has been rewritten in response to other comments but the new sentence now references revised Fig. 9.

      "ATX-2 is required to maintain the integrity of the ball of paternal mitochondria around the sperm DNA, but the mechanism is unknown." - Page 13 reference figure. 

      A reference to Figs 7 and 8 has been inserted.

      " In control embryos, the sperm contents rarely came near the meiotic spindle in agreement with a previous study that found that male and female pronuclei rarely form next to each other (6). Streaming of the sperm contents was most commonly restricted to a jostling motion with little net displacement, circular streaming in the short axis of the embryo, or long axis streaming in which the sperm turned away from the spindle before the halfway point of the embryo. Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)." - Page 13, the corresponding figures need to be referenced for these sentences. 

      We have inserted figure references.

      "In capture events observed after double depletion of ATX-2 and KLP-7, a bundle of microtubules was discernible extending from the spindle into the ER envelope surrounding the sperm DNA. Such bundles were not observed in mei-1(RNAi) capture events, likely because of the previously reported low density of microtubules in mei-1(RNAi) spindles (36, 37)." - Pages 13-14 references figures here. 

      We have inserted figure references.

      "The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubules." - This should be toned down since this phenotype is not robust. 

      We have changed this to: “The capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos suggests that the integrity of the exclusion zone around the sperm DNA might insulate the sperm DNA from spindle microtubules.  However, a much larger number of klp-7(RNAi) singly depleted and atx-2(degron) singly depleted time-lapse sequences are needed to rigorously support this idea. “

      ATX-2 depletion alters ER morphology but does not impact the maternal ER envelope - could the authors provide a potential explanation for this? 

      In the discussion, we cite papers showing that ATX-2 depletion affects many different cellular processes so the effect we see on paternal mitochondria might have nothing to do with the ER ring.   We have been attempting to disrupt ER structures in the meiotic embryo for the last 5 years by depleting profilin, BiP, atlastin, ATX-2 and by optogenetically packing ER into a ball in the middle of the oocyte.  None of these treatments prevent envelopment of the sperm DNA by maternal ER.  None of these treatments remove ER from the spindle envelope and none remove ER from the plasma membrane.  These treatments mostly result in “large aggregates” of ER that we have not examined by EM.  Wild speculation: any disruption of the ER strong enough to prevent ER envelopment around chromatin would be sterile because the M to S transition in the mitotic zone of the germline would be blocked.  Rapid depletion of ATX-2 to the extent shown by rigorous data in this manuscript does not prevent ER envelopment around chromatin.  We chose not to speculate about the reasons for this because we do not know why.

      It would be good to have representative images of what the altered spindle looks like in MEI-1-depleted oocytes. 

      The structure of MEI-1-depleted spindles has been described in the cited references.

      "Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)" - It is intriguing that this does not happen in the double depletion experiments of kinesin-13 and ATX-2. The authors should perhaps discuss this. 

      This does happen in KLP-7 ATX-2 double depleted embryos as shown in Fig. 9.

      (4) Missing citations: 

      "This analysis was restricted to embryos from anaphase I through anaphase II because our streaming data and that of Kimura 2020 indicate that the sperm contents have not moved significantly before anaphase I." - This needs an appropriate citation. Page 10. 

      We have inserted citations here.

      " The simplest explanation is that cytoplasm does not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm." - Citation page 12. Not referencing figures in the discussion. 

      We have changed the sentence to: “The simplest hypothesis is that maternal and paternal cytoplasm might not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm.” 

      "The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubule" - Pages 12-13 reference the figures. 

      A reference to the revised Fig. 9 has been inserted in the revised version of this sentence.

      "ATX-2 is required to maintain the integrity of the ball of paternal mitochondria around the sperm DNA, but the mechanism is unknown." 

      References to Figs. 7 and 8 have been inserted.

      Page 13 reference figure 

      " In control embryos, the sperm contents rarely came near the meiotic spindle in agreement with a previous study that found that male and female pronuclei rarely form next to each other (6). Streaming of the sperm contents was most commonly restricted to a jostling motion with little net displacement, circular streaming in the short axis of the embryo, or long axis streaming in which the sperm turned away from the spindle before the halfway point of the embryo. Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)." Page 13, the corresponding figures need to be referenced for these sentences. 

      We have inserted citations here.

      "In capture events observed after double depletion of ATX-2 and KLP-7, a bundle of microtubules was discernible extending from the spindle into the ER envelope surrounding the sperm DNA. Such bundles were not observed in mei-1(RNAi) capture events, likely because of the previously reported low density of microtubules in mei-1(RNAi) spindles (36, 37)." Pages 13-14 references figures here. 

      We have inserted citations here.

      (5) Referencing wrong figures in the text: 

      Figure 5 - In the figure legend there is a 5C but there is no 5C panel in the figure. 

      A C has been inserted in Fig. 5.

      Figure 6A - "Dark holes were observed suggesting exclusion from the lumens of larger membranous organelles (Fig. 6A; Fig. S2)." Page 10. 

      6A has been changed to 6C.

      Figure 6A is showing background autofluorescence in WT oocytes so I am not certain why it is cited here. 

      The Figure citation has been corrected to 6B, C.

      Figure 8 - I could not find the supplemental data file with the individual mitochondria distance measurements. 

      We are including the Excel file with the revised submission.

      The last sentence of the first paragraph should be re-worded to be more concise ". In C. elegans, the nucleus is positioned away from the site of future fertilization so that the meiosis I spindle assembles at the opposite end of the ellipsoid zygote from the site of fertilization (2-4). " 

      Every word of this sentence is important.

      Last sentence second paragraph typo "These microtubules are thought to drive meiotic cytoplasmic streaming because depletion tubulin stops cytoplasmic streaming (7) and depletion of the microtubule-severing protein katanin by RNAi results in an increased mass of cortical microtubules and an increase in cytoplasmic streaming (8)." Pages 3-4. 

      “of” has been inserted.

      (6) Typos in the introduction should be corrected: 

      Ataxin or kinesin-13 are not mentioned in the introduction but these are a big focus of the paper. 

      Gong et al 2024 written instead of number citation (page 5), no citation in References.

      This has been corrected. 

      Supplemental videos should be labeled appropriately to indicate what structures are labeled. It is currently difficult to understand what is being shown.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Summary:

      The authors used four datasets spanning 30 countries to examine funding success and research quality score for various disciplines. They examined whether funding or research quality score were influenced by majority gender of the discipline and whether these affected men, women, or both within each discipline. They found that disciplines dominated by women have lower funding success and research quality score than disciplines dominated by men. These findings, are surprising because even the men in women-dominated fields experienced lower funding success and research quality score.

      Strengths:

      - The authors utilized a comprehensive dataset covering 30 countries to explore the influence of the majority gender in academic disciplines on funding success and research quality scores.

      - Findings suggest a systemic issue where disciplines with a higher proportion of women have lower evaluations and funding success for all researchers, regardless of gender.

      - The manuscript is notable for its large sample size and the diverse international scope, enhancing the generalizability of the results.

      - The work accounts for various factors including age, number of research outputs, and bibliometric measures, strengthening the validity of the findings.

      - The manuscript raises important questions about unconscious bias in research evaluation and funding decisions, as evidenced by lower scores in women-dominated fields even for researchers that are men.

      - The study provides a nuanced view of gender bias, showing that it is not limited to individuals but extends to entire disciplines, impacting the perception and funding and quality or worth of research.

      - This work underscores the need to explore motivations behind gender distribution across fields, hinting at deep-rooted societal and institutional barriers.

      - The authors have opened a discussion on potential solutions to counter bias, like adjusting funding paylines or anonymizing applications, or other practical solutions.

      - While pointing out limitations such as the absence of data from major research-producing countries, the manuscript paves the way for future studies to examine whether its findings are universally applicable.

      Weaknesses:

      - The study does not provide data on the gender of grant reviewers or stakeholders, which could be critical for understanding potential unconscious bias in funding decisions. These data are likely not available; however, this could be discussed. Are grant reviewers in fields dominated by women more likely to be women?

      - There could be more exploration into whether the research quality score is influenced by inherent biases towards disciplines themselves, rather than only being gender bias.

      - The manuscript should discuss how non-binary gender identities were addressed in the research. There is an opportunity to understand the impact on this group.

      - A significant limitation is absence of data from other major research-producing countries like China and the United States, raising questions about the generalizability of the findings. How comparable are the findings observed to these other countries?

      - The motivations and barriers that drive gender distribution in various fields could be expanded on. Are fields striving to reach gender parity through hiring or other mechanisms?

      - The authors could consider if the size of funding awards correlates with research scores, potentially overlooking a significant factor in the evaluation of research quality. Presumably there is less data on smaller 'pilot' funds and startup funds for disciplines where these are more common. Would funding success follow the same trend for these types of funds?

      - The language used in the manuscript at times may perpetuate bias, particularly when discussing "lower quality disciplines," which could influence the reader's perception of certain fields.

      - The manuscript does not clarify how many gender identities were represented in the datasets or how gender identity was determined, potentially conflating gender identity with biological sex.

      Reviewer #3 (Public Review):

      This study seeks to investigate one aspect of disparity in academia: how gender balance in a discipline is valued in terms of evaluated research quality score and funding success. This is important in understanding disparities within academia.

      This study uses publicly available data to investigate covariation between gender balance in an academic discipline and:

      i) Individual research quality scores of New Zealand academics as evaluated by one of 14 broader subject panels.

      ii) Funding success in Australia, Canada, Europe, UK.

      The study would benefit from further discussion of it limitations, and from the clarification of some technical points (as described in the recommendations for the authors).

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      This is a very nice study as-is. In the following comments, I have mainly put my thoughts as I was reading the manuscript. If there are practical ways to answer my questions, I think they could improve the manuscript but the data required for this may not be available.

      Are there any data on the gender of grant reviewers or stakeholders who make funding decisions?

      The research quality score metrics seem to be more related to unconscious bias. The funding metrics may also, but there are potentially simple fixes (higher paylines for women or remove gender identities from applications).

      We have included some details about PBRF funding panel gender diversity. These panels are usually more gender balanced than the field they represent, but in the extreme cases (Engineering, Education, Mathematics) they are skewed as would be expected. Panels for other award decision makers was not available.

      I wonder if the research score metric isn't necessarily reflecting on the gender bias in the discipline but rather on the discipline itself? Terms like "hard science" and "soft science" are frequently used and may perpetuate these biases. This is somewhat supported by the data - on line 402-403 the authors state that women in male-dominated fields like Physics have the same expected score as a man. Could it be that Physics has a higher score than Education even if Physics was woman-dominated and Education was man-dominated? Are there any instances in the data where traditionally male- or female-dominated disciplines are outliers and happen to be the opposite? If so, in those cases, do the findings hold up?

      Overall we would love to answer this question! But our data is not enough. We mention these points in the Discussion (Lines 472-466). We have extended this a little to cover the questions raised here.

      How are those with non-binary gender identities handled in this article? If there is any data on the subject, I would be curious to know how this effects research score and funding success.

      These data were either unavailable or the sample size was too small to be considered anonymously (Mentioned on Lines 74-76).

      A limitation of the present article is a lack of data on major research-producing countries like China and the United States. Is there any data relevant to these or other countries? Is there reason to believe the findings outlined in this manuscript would apply or not apply to those countries also?

      We would be very excited to see if the findings held up in other countries, particularly any that were less European based. Unfortunately we could not find any data to include. Maybe one day!

      What are the motivations or other factors driving men to certain fields and women to certain fields over others? What are the active barriers preventing all fields from 50% gender parity?

      Field choice is a highly studied area and the explanations are myriad we have included a few references in the discussion section on job choice. I usually recommend my students read the blog post at

      https://www.scientificamerican.com/blog/hot-planet/the-people-who-could-have-done-science-didnt/

      It is very thoughtful but unfortunately not appropriate to reference here.

      The authors find very interesting data on funding rates. Have you considered funding rates and the size of funding awards as a factor in research score? Some disciplines like biomedical science receive larger grants than others like education.

      A very interesting thought for our next piece of work. We would definitely like to explore our hypothesis further.

      There are instances where the authors writing may perpetuate bias. If possible these should be avoided. One example is on line 458-459 where the authors state "...why these lower quality disciplines are more likely..." This could be re-written to emphasize that some disciplines are "perceived" as lower quality. Certainly those in these discipline would not characterize their chosen discipline as "low quality".

      Well-spotted! Now corrected as you suggest.

      Similar to the preceding comment, the authors should use care with the term "gender". In the datasets used, how many gender identities were captured? How many gender identity options were given in the surveys or data intake forms? Could individuals in these datasets have been misgendered? Do the data truly represent gender identity or biological sex?

      We know that in the PBRF dataset gender was a binary choice and transgender individuals were able to choose which group they identified with. There was no non-binary option (in defence the latest dataset there is from 2018 and NZ has only recently started updating official forms to be more inclusive) and individuals with gender not-stated (a very small number) were excluded. ARC did mention that a small number of individuals were either non-binary or gender not stated, again these are not included here for reasons of anonymity. This is now mentioned on Lines 74-76. The effects on this group are important and understudied likely because, as here, the numbers are too small to be included meaningfully.

      Reviewer #3 (Recommendations For The Authors):

      Major revisions:

      Could you add line numbers to the Supplementary Materials for the next submission?

      Yes! Sorry for the omission.

      (1) In the main text L146 and Figure 1, it is not clear why the expected model output line is for a 50 year old male from University of Canterbury only, but the data points are from disciplines in all eight universities in New Zealand. I think it would be more clear and informative to report the trend lines that represent the data points. At the moment it is hard to visualise how the results apply to other age groups or universities.

      As age and institution are linear variables with no interactions they are only a constant adjustment above or below this line and the adjustment is small in comparison to the linear trend. Unfortunately, if they were included graphically they do not aid understanding. We agree that indluded raw data with an adjusted trend line can be confusing buy after a lor of between-author discussion this was the most informative compromise we could find (many people like raw data so we included it).

      (2) Does your logistic regression model consider sample size weighting in pmen? Weighting according to sample sizes needs to be considered in your model. At the moment it is unclear and suggests a proportion between 0 and 1 only is used, with no weighting according to sample size. If using R, you can use glm(cbind(nFem, nMalFem).

      Yes. All data points were weighted by group size exactly as you suggest. We have updated the text on Lines 317 to make this clear.

      (3) For PBRF, I think it is useful to outline the 14 assessment panels and the disciplines they consider. Did you include the assessment panel as an explanatory variable in your model too to investigate whether quality is assessed in the same manner between panels? If not, then suggest reasons for not doing so.

      We have now included more detail in main text on the gender split of the panels. They were not included as an explanatory variable. In theory there was some cross-referencing of panel scores to ensure consistency as part of the PBRF quality assurance guidelines.

      (4) There are several limitations which should be discussed more openly:

      Patterns only represent the countries studied, not necessarily academia worldwide.

      Mentioned on Line 485-487.

      Gender is described as a binary variable.

      Discussed on Line 74-76.

      The measure of research evaluation as a reflection of academic merit.

      This is acknowledged in the data limitations paragraph in the discussion, at the end of the discussion

      Minor revisions:

      (1) L186. Why do you analyse bibliometric differences between individuals from University of Canterbury only? It would be helpful to outline your reasons.

      Although bibliometric data is publicly available it is difficult to collect for a large number of individuals. You also need some private data to match bibliometrics with PBRF data which is anonymous. We were only able to do this for our own institution with considerable internal support.

      (2) How many data records did you have to exclude in L191 because they could not be linked? This is helpful to know how efficient the process was, should anyone else like to conduct similar studies.

      We matched over 80% of available records (384 individuals). We have mentioned this on Line 194.

      (3) Check grammar in the sentence beginning in L202.

      Thank-you. Corrected.

      (4) Please provide a sample size gender breakdown for "University of Canterbury (UC) bibliometric data", as you do for the preceding section. A table format is helpful.

      Included on Line 194.

      (5) L377 I think this sentence needs revision.

      Thank you, we have reworked that paragraph.

      (6) L389-392 Is it possible evaluation panels can score women worse than men and that because more women are present in female-biassed disciplines, the research score in these are worse? Women scoring worse between fields, may be a result of some scaling to the mean score.

      No.  This is not possible because women in male-dominated fields score higher.

      (7) L393 Could you discuss explanations for why men outperform women in research evaluation scores more when disciplines are female dominated?

      Unfortunately, we don’t have an explanation for this and can’t get one from our data. We hope it will be an interesting for future work.

      (8) Could the figures be improved by having the crosses, x and + scaled, for example, in thickness corresponding to sample size? Alternatively, some description of the sample size variation? Sorting the rows by order of pmen in Table E1 would also be helpful for the reader.

      As with the previous figure we have tried many ways of presenting it (including tis one). Unfortunately nothing helped.

      We have provided Table E1 as a spreadsheet to allow readers to do this themselves.

      (9) Please state in your methods section the software used to aid repeatability.

      This is now in Supplementary Materials (Matlab 2022b).

      (10) It is great to report your model findings into real terms for PBRF and ARC. Please can you extend this to CIHR and EIGE. i.e. describing how a gender skew increase of x associates with a y increase in funding success chance.

      We have added similar explanations for both these datasets comparing the advantage of being male with the advantage of working in a male dominated discipline.

      (11) I would apply care to using pronouns "his" and "her" in L322-L324 and avoid if at all possible, instead, replacing them with "men" and "women".

      We have updated the text to avoid there pronouns in most places.

      The article in general would benefit from a disclosure statement early on conceding that gender investigated here is only as a binary variable, discounting its spectrum.

      See Line 74-76.

      Please also report how gender balance is defined in the datasets as in the data summary in supplementary materials, within the main text.

      Our definition of gender balance (proportion of researchers who are men, ) is given on Line 103.

      (12) The data summary Table S1 could benefit from explaining the variables in the first column. It is currently unclear how granularity, size of dataset and quotas/pre-allocation? are defined.

      These lines have been removed as they information they contained is included elsewhere in the table with far better explanations!

      (13) There are only 4 data points for investigating covariation between gender balance and funding success in CIHR. This should be discussed as a limitation.

      The small size of the dataset is now mentioned on Line 348.

      (14) L455 "Research varies widely across disciplines" in terms of what?

      This sentence has been extended

      .

      (15) L456 Maybe I am missing something but I don't understand the relevance of "Physicists' search for the grand unified theory" to research quality.

      Removed.

      (16) Can you provide more discussion into the results of your bibliographic analysis and Figure 2? An explanation into the relationships seen in the figure at least would be helpful.

      Thank you we have clarified the relationships seen in each of figures 2A (Lines 226-235), 2B (Lines 236-252), and 2C (lines  260-268).

      (17) It would be helpful to include in the discussion a few more sentences outlining:

      - Potential future research that would help disentangle mechanisms behind the trends you find.

      - How this research could be applied. Should there be some effort to standardise?

      We have added a short paragraph to the discussion about implications/applications, and future research (Lines 481-484).

      (18) The introduction could benefit from discussing and explaining their a priori hypotheses for how research from female-biassed disciplines may be evaluated differently.

      While not discussed in the introduction, possible explanations for why and how research in female dominated fields might be evaluated differently are explored in some detail in the Discussion.  We think once is enough, and towards the end is more effective than at the beginning.

      (19) L16 "Our work builds on others' findings that women's work is valued less, regardless of who performs that work." I find this confusing because in your model, there is a significant interaction effect between gender:pmen. This suggests that for female-biassed disciplines, there is even more of a devaluation for women, which I think your lines in figure 1 suggest.

      Correct but men are still affected, so the sentence is correct.  What is confusing is that the finding is counter to what we might expect.

    1. Author response:

      eLife assessment

      This fundamental study provides a near-comprehensive anatomical description and annotation of neurons in a male Drosophila ventral nerve cord, based on large-scale circuit reconstruction from electron microscopy. This connectome resource will be of substantial interest to neuroscientists interested in sensorimotor control, neural development, and analysis of brain connectivity. However, although the evidence is extensive and compelling, the presentation of results in this very large manuscript lacks clarity and concision.

      We thank the reviewers for their detailed and thoughtful feedback and the time that they invested to provide it. Organising this manuscript (which is clearly not a standard research article) was quite challenging as it had to fulfil a number of functions: presenting a guide to the system of annotations and the associated online resources; providing an atlas for the annotated cell types; and showcasing various analyses to illustrate the value of the dataset as well as just a few of the many questions it can be used to address. We gave careful consideration to its structure and attempted to signpost the sections that would be most useful to particular types of readers. Nevertheless we can see that this was not completely successful and we thank the reviewers for their suggestions for improvement.

      We acknowledge that the resulting manuscript was very large and will endeavour to streamline our text in the revision without compromising the accessibility of the data. We do note that there is some precedent for comprehensive and lengthy connectome papers going all the way back to White et al. 1986 which took 340 pages to describe the 302 neurons of the C. elegans connectome. More recently, we can compare the “hemibrain papers” published in eLife: Scheffer et al., 2020, Li et al., 2020, Schlegel et al., 2021, Hulse et al., 2021. These papers would also be difficult to digest at a single sitting but were game-changing for the Drosophila neuroscience field and have already been cited hundreds of times, a testament to their utility. In the same way that these papers provided the first comprehensively proofread and annotated EM connectome for (a large part of) the adult fly brain, our work now provides the first fully proofread and annotated EM connectome for the nerve cord. Given the pioneering nature of this dataset we feel that the lengthy but highly structured atlas sections of the paper are justified and will prove impactful in the long term.

      Whilst no EM dataset is perfect, we have endeavoured to make this one as comprehensive as possible. We found 74.4 million postsynapses and 15,765 neurons of VNC origin, all of which have been carefully proofread, reviewed, annotated and typed. For comparison, the female adult nerve cord dataset (FANC, Azevedo et al., Nature, 2024) contains roughly 45 million synapses and 14,600 neuronal cell bodies of which at the time of writing 5576 have received preliminary proofreading and 222 high quality proofreading. We emphasise that these are highly complementary datasets, given the difference in sex and the fact that each dataset has different artefacts (MANC has poorer preservation of neurons in the leg nerves; FANC is missing part of the abdominal ganglion and has lower synapse recovery). We reconstructed 5484 sensory neurons from the thoracic nerves, 84% of the ~6500 estimated from FANC. The overall recovery rate was ~86.5% if we include the ~1100 sensory neurons from abdominal nerves, which were in excellent condition.

      Reviewer #1 (Public Review):

      Summary:

      The authors present a close to complete annotation of the male Drosophila ventral nerve cord, a critical part of the fly's central nervous system.

      Strengths:

      The manuscript describes an enormous amount of work that takes the first steps towards presenting and comprehending the complexity and organization of the ventral nerve cord. The analysis is thorough and complete. It also makes the effort to connect this EM-centric view of the nervous system to more classical analyses, such as the previously defined hemilineages, that also describe the organization of the fly nervous system. There are many, many insights that come from this work that will be valuable to the field for the foreseeable future.

      We thank the reviewer for acknowledging the enormous collaborative effort represented by this manuscript. We tried to synthesise decades of light-level work by neuroscientists and developmental biologists working in Drosophila and other insects in order to create a standard, systematic nomenclature for >22,000 neurons, most of which had not been typed at light level. We hope that the MANC dataset and this guide to its contents will prove to be useful resources to Drosophila neurobiologists and the wider neuroscience field.

      Weaknesses:

      With more than 60 primary figures, the paper is overwhelming and cannot be read and digested in a single sitting. The result is more like a detailed resource rather than a typical research paper.

      In writing this paper, we had two aims: first, to describe and validate our extensive biological annotation of the connectome and second, to provide interesting illustrative examples of the many analyses that could be carried out on this dataset using the atlas we generated. The resulting paper is intended primarily as a detailed reference rather than a typical research paper. At the end of the Introduction, we outline the structure of the paper and explicitly direct non-specialist readers to focus on the initial and concluding sections for orientation to the dataset so that they would not get bogged down in the details. We will review our section organisation and headings to try to make the paper more straightforward to navigate, and we will add specific figure numbers to the outline.

      Reviewer #2 (Public Review):

      Summary and strengths:

      This massive paper describes the identity and connectivity of neurons reconstructed from a volumetric EM image volume of the ventral nerve cord (VNC) of a male fruit fly. The segmentation of the EM data was described in one companion paper; the classification of the neurons entering the VNC from the brain (descending neurons or DNs) and the motor neurons leaving the VNC was described in a second companion paper. Here, the authors describe a system for annotating the remaining neurons in the VNC, which include intrinsic neurons, ascending neurons, and sensory neurons, representing the vast majority of neurons in the dataset. Another fundamental contribution of this paper is the identification of the developmental origins (hemilineage) of each intrinsic neuron in the VNC. These comprehensive hemilineage annotations can be used to understand the relationship between development and circuit structure, provide insight into neurotransmitter identity, and facilitate comparisons across insect species.Many sensory neurons are also annotated by comparison to past literature. Overall, defining and applying this annotation system provides the field with a standard nomenclature and resource for future studies of VNC anatomy, connectivity, and development. This is a monumental effort that will fundamentally transform the field of Drosophila neuroscience and provide a roadmap for similar connectomic studies in other organisms.

      We thank the reviewer for acknowledging the enormous collaborative effort represented by this manuscript. We tried to synthesise decades of light-level work by neuroscientists and developmental biologists working in Drosophila and other insects in order to create a standard, systematic nomenclature for >22,000 neurons, most of which had not been typed at light level. We hope that the MANC dataset and this guide to its contents will prove to be useful resources to Drosophila neurobiologists and the wider neuroscience field.

      Weaknesses:

      Despite the significant merit of these contributions, the manuscript is challenging to read and comprehend. In some places, it seems to be attempting to comprehensively document everything the authors found in this immense dataset. In other places, there are gaps in scholarship and analysis. As it is currently constructed, I worry that the manuscript will intimidate general readers looking for an entry point to the system, and ostracize specialized readers who are unable to use the paper as a comprehensive reference due to its confusing organization.

      In writing this paper, we had two aims: first, to describe and validate our extensive biological annotation of the connectome and second, to provide interesting illustrative examples of the many analyses that could be carried out on this dataset using the atlas we generated. The resulting paper is intended primarily as a detailed reference rather than a typical research paper. At the end of the Introduction, we outline the structure of the paper and explicitly direct non-specialist readers to focus on the initial and concluding sections for orientation to the dataset so that they would not get bogged down in the details. We will review our section organisation and headings to try to make the paper more straightforward to navigate, and we will add specific figure numbers to the outline.

      The bulk of the 559 pages of the submitted paper is taken up by a set of dashboard figures for each of ~40 hemilineages. Formatting the paper as an eLife publication will certainly help condense these supplemental figures into a more manageable format, but 68 primary figures will remain, and many of these also lack quality and clarity. Without articulating a clear function for each plot, it is hard to know what the authors missed or chose not to show. As an example, many of the axis labels indicate the hemilineage of a group of neurons, but are ordered haphazardly and so small as to be illegible; if the hemilineage name is too small, and in a bespoke order for that data, then is the reader meant to ignore the specific hemilineage labels?

      We will contact eLife professional editing staff to determine whether the paper can be streamlined by moving more material to supplemental without making it difficult to locate the detailed catalogues of neurons that will be of interest to specialist readers. Based on the typical eLife format, we suspect that retaining the dashboard main figures for each hemilineage will be necessary to maintain its utility as a reference. We will, however, shorten the associated main text by, for example, moving background material used to assign the hemilineages to the Methods section and moving specific results to the figure legends where possible.

      We articulated the function for each plot as follows: "Below we describe in more depth every hemilineage that produces more than one or two secondary neurons. For each of these 35 hemilineages, we show (A) the overall morphology of the secondary population, (B) representative individual neurons (as estimated by highest average NBLAST score to other members of the hemilineage), and (C) specific notable examples (which in some cases are primary). We then report (D) the locations of their connectors (postsynapses and presynapses), (E) their upstream and downstream partners by class, and (F) their upstream and downstream partners by finer subdivisions corresponding to their systematic types (secondary hemilineage, target, or sensory modality). We also provide supplementary figures showing the morphology and normalised up- and downstream connectivity of all systematic types for each hemilineage."

      We have plotted every secondary neuron in each hemilineage, every predicted synapse for those neurons with confidence >0.5, every connection to partner neurons by class (no threshold applied), and then the same information organised by hemilineage in a heatmap (and including partners from all birthtimes and partners of unknown hemilineage). Then the supplementary figures show all connectivity, organised in the same way, for every individual cell type assigned to the hemilineage, including both primary and early secondary neurons. We will add more detail to the figure legends to clarify these points.

      We apologise that you were unable to read some of the axis labels in the review copy of the manuscript; we did submit high resolution versions of the figures as a supplemental file, but perhaps this did not reach you; they can also be found at https://www.biorxiv.org/content/10.1101/2023.06.05.543407v2.supplementary-material. The hemilineages are in a conserved (alphanumerical) order for all hemilineage-specific plots and many others. The exceptions arise when neurons are clustered based on their connectivity to hemilineages, in which case the order of the labels necessarily follows the structure of the resulting clusters.

      The text has similar problems of emphasis. It is often meandering and repetitive. Overlapping information is found in multiple places, which causes the paper to be much longer than it needs to be. For example, the concept of hemilineages is introduced three times before the subtitle "Introduction to hemilineage-based organisation". When cell typing is introduced, it is unclear how this relates to serial motif, hemilineage, etc; "Secondary hemilineages" follow the Cell typing title. Like the overwhelming number of graphical elements, this gives the impression that little attention has been paid to curating and editing the text. It is unclear whether the authors intend for the paper to be read linearly or used as a reference. In addition, descriptions of the naming system are often followed by extensive caveats and exceptions, giving the impression that the system is not airtight and possibly fluid. At many points, the text vacillates between careful consideration of the dataset's limitations and overly grandiose claims. These presentation flaws overshadow the paper's fundamental contribution of describing a reasonable and useful cell-typing system and placing intrinsic neurons within this framework.

      Because we intended this paper to be read primarily as a reference, we tried to make each section stand on its own, which we agree resulted in some redundancy (with more details appearing where relevant). However, we will do our best to tighten the text for the version of record.

      Our description immediately under the Cell typing title includes the use of hemilineage, serial (not serial motif, which was not used), and laterality (left-right homologues) in the procedure to assign cell types. We will change this to “Cell typing of intrinsic, ascending, and efferent neurons” for clarity. The “Secondary hemilineages” title marks the start of a new section that serves as a reference for each of the secondary hemilineages; we will change this to “Secondary hemilineage catalogue” or similar for clarity.

      References to past Drosophila literature are inconsistent and references to work from other insects are generally not included; for example, the extensive past work on leg sensory neurons in locusts, cockroaches, and stick insects. Such omissions are understandable in a situation where brevity is paramount. However, this paper adopts a comprehensive and authoritative tone that gives the reader an impression of completeness that does not hold up under careful scrutiny.

      We did not attempt to review the sensory neuron literature in this manuscript but rather cited those specific papers which included the axon morphology data that informed our modality, peripheral origin, and cell type assignments. Most of these came from the Drosophila literature due to the availability of genetic tools used for sparse labelling of specific populations as well as the greatly increased likelihood of conserved morphology. However we certainly agree that decades of sensory neuron work in larger insects were foundational for this subfield and will add a sentence to this effect in the introduction to our sensory neuron typing.

      The paper accompanies the release of the MANC dataset (EM images, segmentation, annotations) through a web browser-based tool: clio.janelia.org. The paper would be improved by distilling it down to its core elements, and then encouraging readers to explore the dataset through this interactive interface. Streamlining the paper by removing extraneous and incomplete analyses would provide the reader with a conceptual or practical framework on which to base their own queries of the connectome.

      We certainly hope that this paper will encourage readers to explore the MANC dataset. Indeed, as we state in the Discussion, "Moreover, its ultimate utility depends on how widely it is leveraged in the future experimental and computational work of the entire neuroscience community. We have only revealed the tip of the iceberg in this report, with a wealth of opportunities now available in this publicly available dataset for forthcoming connectomic analyses that will feed into testable functional hypotheses." In the first few sections of the Results, we include a visual introduction to annotated features, a glossary of annotation terms, a visual guide to our cell typing nomenclature, and two video tutorials on the use of Clio Neuroglancer to query the dataset. To further encourage exploration, we have also included illustrative examples of just a few of the many analyses that can now be performed with this comprehensive and publicly available dataset.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Q1: First of all, the term organoid must be discarded. The authors just seed the endometrial cell mixture which assembles and aggregates into a 3D structure which is then immediately used for analysis. Organoids grow from tissue stem cells and must be passage-able (see their own description in lines 69-71). So, the term organoid must be removed everywhere, to not confuse the organoid field. It is not shown that the whole 3D assembly is passageable, which would be very surprising given the fact that immune and stromal cells do not grow in Matrigel because of the unfavorable growing conditions (which are targeted to epithelial cell growth).

      We appreciate for your highlighting concerns regarding our organoid construction.

      (1) The organoids in our system were originated from tissue stem cells.

      We induced adult stem cells derived from endometrial tissue to construct organoids in vitro by various small molecules (such as Noggin, EGF, FGF2, WNT-3A and R-Spondin1), which involves a complex self-assembly process rather than a mere cellular assembly. Initially, there are single cells and small cell clusters in the system two days after the planting. On the fourth day, the glandular epithelial cells gradually assembled to glands, while the stromal cells spontaneously organized themselves around the glands.  On the eleventh day, the endometrial glands enlarged, epithelial cells organized in a paving stone arrangement, and stromal cells established an extensive network. (Author response image1) (Figure 1C)

      (2) The organoids we constructed are passage-able.  

      Most organoids were used for experiments up to the fifth generation, while some are extended to the 10th generation and cryopreserved. (Response Figure 1B, C)

      (3) Immune and stromal cells are present in our system from the primary to the fourth generation. In our study, immune and stromal cells were identified not only from scRNA-seq data (third generation of organoids) (Figure 2A), but also from the morphology using 3D transparent staining and light sheet microscopy imaging (third generation of organoids), with Vimentin marking stromal cells, CD45 designating immune cells, and FOXA2 identifying glands. Further, flow cytometric analysis was applied to verify immune cells within the organoids (third generation of organoids). (Response Figure 1D, E, F)  

      Moreover, Immune cells and stromal cells can grow in Matrigel, which was also found in the study of organoid pioneer Hans Clevers (Hans Clevers et al., Nature Reviews Immunology 2019).

      Author response image 1.

      (A) The growth condition of endometrial cells was observed from day2 to day11 after plating under an inverted microscope. Scale bar = 200 μm. (B) The endometrial organoids of different passages were observed from P1 to P5. Scale bar = 200 μm. (C) Stromal cells formed an extensive network (down). The arrowhead indicates dendritic stromal cells. Scale bar = 100 μm (left), Scale bar = 50 μm (right). (D) Exhibition of stromal cells marked by vimentin. Nuclei were counterstained with DAPI. The arrow indicates stromal cells. Scale bar = 40 μm (up), Scale bar = 30 μm (down). (E)Exhibition of immune cells marked by CD45 and endometrial gland marked by FOXA2. Nuclei were counterstained with DAPI. The arrow indicates immune cells. Scale bar = 50 μm. (F) Flow cytometric analysis of T cells and macrophages in the endometrial organoid. Gating strategy used for determining white blood cells (CD45+ cells), T cells (CD45+CD3+ cells) and macrophages (CD45+CD68+CD11b+ cells).

      Q2: Second, the study remains fully descriptive, bombing the reader with a mass of bioinformatic analyses without clear descriptions and take-home messages. The paper is very dense, meaning readers may give up. Moreover, functional validation, except for morphological and immunostaining analyses (which are posed as "functional" but actually are only again expression) is missing, such as in vivo functionality (after transplantation e.g.) and embryo interaction. Importantly, the 3D structure misses the right architecture with a lining luminal epithelium which is present in the receptive endometrium in vivo and needed as the first contact site with the embryo. So, in contrast to what the authors claim, this is not the best model to study embryo interaction, or the closest model to the in vivo state (line 318, line 326).

      Thank you.

      (1) We have made the following improvements. Firstly, we have conducted additional experiments to validate the bioinformatics analysis. Secondly, the structure of the manuscript has been refined to ensure logical coherence and clear transitions between paragraphs. Thirdly, important findings have been emphasized to ensure readers’ comprehension and inspiration. Furthermore, the manuscript was revised by both domestic and international experts to enhance the readability and clarity.

      (2)  For the functional validation, in vivo transfer could not be carried out so far due to ethical limitation. But human embryos are able to develop and grow more efficiently in combining with the receptive endometrial organoids we generated (unpublished data).

      (3) As you suggested, we replaced the “closest” with “closer”. It is undeniable that the model cannot completely simulate the in vivo implantation process that the luminal epithelium of the endometrium contacts the embryo first.  

      Q3: Third, receptive endometrial organoids (assembloids; Rawlings et al., eLife 2021) and receptive organoid-derived "open-faced endometrial layer" (Kagawa et al., Nature 2022) have already been described, which is in contrast to what the authors claim in several places that "they are the first" (e.g. lines 87-88, 316-319, etc). These studies used real organoids to achieve their model (and even showed embryo interaction), while in the present study, different cell types are just seeded and assembled. Hence, logically, immune cells are present which are never found in real organoid models. The only original aspect in the present study is the use of hormones to enhance the WOI phenotype. However, crucial information on this original aspect is missing such as concentration of the hormones, refreshment schedule, all 3 hormones added together or separately, and all 3 required?

      Thank you for pointing out these researches referring to endometrial organoids.

      (1) While we didn’t explicitly state "the first", we should be careful to use the expressions similar to "the first". It has been changed to a gentle and modest expression, as follows “we are far from understanding how embryo implantation occurs during the WOI due to ethical limitations and fewer in vitro receptive endometrial model” and “which confirms that they are closer to the in vivo state”.

      (2) The definition of organoids and the existence of immune cells have been detailed addressed in the first question.

      (3) In terms of hormone scheme, hormone concentrations have been detailed in Table S2 of Supplementary. Estrogen was supplemented to the basal medium for the initial two days, after which a combination treatment of MPA, cAMP, PRL, hPL, and HCG was administered for the subsequent six days. The medium was refreshed every two days.

      All three hormones were deemed necessary, which was validated by multiple group comparisons. Only the organoids treated with all six hormones together exhibited an endometrial receptivityrelated gene expression profile. (Author response image 2).

      Author response image 2.

      Heatmap showing receptivity related gene expression profile of organoids in each hormone regimen.  

      Q4: Moreover, it is not a "robust" model at all as the authors claim, given the variability of the initial cell mixture (varying from patient to patient). Actually, the reproducibility is not shown. The proportions of the different cell types seeded in the Matrigel droplet will be different with every endometrial biopsy. It would be much better to recombine epithelial (passageable) organoids with stromal and immune cells in a quantified, standardized manner to establish a "robust" model.

      Thanks for your suggestion.  

      Firstly, the constructed endometrial organoids generally consist of epithelial, stromal, and immune cells. However, it is undeniable that the cell proportions may vary slightly among different patients. Secondly, the term "robust" is intended to convey strong support for embryo development, which will be supported by our next study (unpublished data). Therefore, robust is replaced here as alternative. Thirdly, as for "reproducibility", the hormone-treated organoids from different women exhibited similarity to the in vivo receptive endometrium through multi-omics analysis, ERT, and various other experiments.  

      Reviewer #2 (Public Review):

      Q1: With endometrial receptivity analysis, they suggest a successful formation of the implantation window in vitro, but this result is difficult to interpret.

      Thanks for your question.  

      We understand that the most effective way to demonstrate endometrial receptivity is embryo implantation, which was conducted simultaneously and will be presented in our next study. In this study, we validated the receptivity based on the current researches.

      (1) At the single-cell transcriptome level, the cellular composition and function of the receptive endometrial organoids were similar to those of the in vivo implantation window (Stephen R. Quake et al, 2020).

      (2) At the whole organoids level, the receptive endometrial organoids exhibited the similar characteristics in transcriptome and proteome to the in vivo mid-secretory endometrium (Andres Salumets 2017, Qi Yu 2018, Triin Laisk 2018, Edson Guimarães Lo Turco 2018, Xiaoyan Chen 2020, Francisco Domínguez 2020, DavidW. Greening 2021, Norihiro Sugino 2023). The receptive endometrial organoids were also validated by endometrial receptivity test (ERT), which utilized high-throughput sequencing and machine learning to assess endometrial receptivity (Yanping Li et al., 2021).  

      (3) At the microstructural level under electron microscope, the receptive endometrial organoids exhibited characteristics of the implantation window, such as pinopodes, glycogen particles, microvilli, and cilia.

      Overall, the receptive organoids we constructed closely resemble the in vivo implantation window at the single-cell, organoids, and microstructural levels based on existing researches.

      Q2: Analyzing transcriptome and proteome information of WOI organoids, authors demonstrate a strong response to estrogen and progesterone, but some comparisons are made with CTRL and SEC, and others only with CTRL, which limits the power of some results. In the same way, some genes related to Cilia and pinopodes appear dominant in WOI organoids, but the comparison by electron microscopy is made only against CTRL organoids.  

      In subsequent analysis, WOI organoids showed a marked differentiation from proliferative to secretory epithelium, and from proliferative epithelium to EMT-derived stromal cells than SEC organoids. These statements are based on their upregulation of monocarboxylic acid and lipid metabolism, their enhanced peptide metabolism and mitochondrial energy metabolism, or their pseudotime trajectories. However, other analyses (such as the accumulation of secretory epithelium or decreased proliferative epithelium, the increased ciliated epithelium after hormonal treatment, or the presence of EMT-derived stromal cells) show only small differences between SEC and WOI organoids.

      Thank you for raising these important questions.

      (1) At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), and both are similar at the overall organoid level.  

      (2) At the single-cell level, the accumulation of secretory epithelium, decreased proliferative epithelium, increased ciliated epithelium post hormonal treatment, or the presence of EMTderived stromal cells are the fundamental features of the secretory endometrium. Therefore, these features are present in both WOI and SEC organoids. However, the most notable differences lie in the more comprehensive differentiation and varied cellular functions exhibited by WOI organoids compared to SEC organoids.

      (3) Regarding electron microscopy, we have now quantitatively compared the presence of various characteristic structures such as microvilli, cilia, pinopodes and glycogen in the CTRL, SEC and WOI groups. It has been observed that WOI organoids possess longer microvilli and increased cilia, glycogen, and pinopodes compared to SEC organoids (Fig2H).

      Reviewer #1 (Recommendations For The Authors):

      Q1: Several of the key methods are performed by companies, hence not in detail described and therefore not verifiable which is essential for reviewers and readers.

      We are grateful for the suggestion. Specific methods have now been incorporated into the "Supporting Information" section. (Line91~102, Line 107~123, Line 132~139)

      Q2 - Line 49: It is not shown in the present study whether the WOI organoids are a 'robust' platform.

      - Line 76: There is a study (Dolat L., Valdivia RH., Journal of Cell Science, 2021) that developed a co-culture with endometrial organoids and immune cells (neutrophils) which should be mentioned.:

      We have reweighed the word and now replace 'robust' with 'alternative' (Line 54).  We have considered the reviewer's suggestion and added this citation (Line 82-83) about the cocultivation of immune cells with endothelial organoids, which was not previously cited mainly because the research model was mouse.

      Q3: Figure 1: Endometrial organoids possess endometrial morphology and function. - The authors should further explain their decision to add PRL, hCG, and hPL to the organoid culture. Why these particular compounds? What is their specific role during the WOI?

      In terms of hormone scheme, estrogen and progesterone promote the transition of endometrial organoids into the secretory phase, and on this basis, pregnancy hormones can further promote their differentiation. PRL promotes immune regulation and angiogenesis during implantation, HCG improves endometrial thickness and receptivity, and HPL promotes the development and function of endometrial glands. Our constructed WOI organoid is in a state conducive to embryo implantation. We aim to develop an in vitro model for embryo implantation study. The detailed explanation of this aspect was initially provided in the Discussion section (Lines 298–313). To enhance the clarity for reviewers and readers regarding the selection of the hormonal regimen, we have now articulated it in the Results section (Lines 124–130).

      When selecting hormone formulations, multiple group comparisons were made. It was found that the number, area, and average intensity of organoids in these groups were similar over time. But the WOI organoids showed endometrial receptivity related gene expression profile, which highly expressed genes positively correlated with endometrial receptivity, and lowly expressed genes negatively correlated with receptivity, compared to the other hormone formulations (added to Figure S1E, S1F). Hormone dosage was primarily based on peri-pregnant maternal body or localized endometrium levels (Margherita Y. Turco et al., Nature Cell Biology 2017).

      -  Line 108: "the endometrial cells" instead of "endometrial organoid"? Because the authors also refer to the stromal cells.

      You should be referring to this sentence “The endometrial organoid, consisting of vesicle-like glands, fibrous stromal cells, and other surrounding cells, developed into a 3D structure with the support of Matrigel”. Organoid, a self-assembled 3D structure, consists of multiple cells and closely resembles in vivo tissue or organ. It offers high expansibility, phenotypic, and functional properties. Here, we aim to delineate the endometrial organoid, comprising epithelial cells, stromal cells, and other cellular components that assemble to form intricate 3D structures. Hence, the term "endometrial organoid" is more appropriate.

      -  Line 110: "the endometrial glands", do the authors mean the endometrial organoids? The authors also mention they enlarge, which must be quantified.

      You should be referring to this sentence “As the organoids grew and differentiated, the endometrial glands enlarged, epithelial cells adopted a paving stone arrangement, and stromal cells formed an extensive network”. Here, we mean the “endometrial glands” grow progressively in the organoids. We agree with your suggestion to quantify the change of organoids’ area over time, and found that they increased progressively in all three groups (shown as follows) (Fig.S1E) (Line130-131) 

      Author response image 3.

      The dynamic changes of the area of organoids over time in the CTRL, SEC and WOI organoids.

      -  Line 112: E-cadherin is a general epithelial marker, not a glandular marker.

      We agree with your suggestion and now change to ‘The epithelium marker E-cadherin’ (Line110).

      -  Line 116: Which group was used for KI67 and CC3 staining?

      The CTRL organoids were used for Ki67 and CC3 staining. We have modified this expression in the Figure 1E Legend.

      -  Line 123: Organoid size (diameter or area) needs to be quantified to claim that WOI organoids grow slower than SEC/CTRL organoids. The same goes for Ki67+ cells for proliferation. In the legend of Fig 1B, the authors in contrast state that the organoids show a similar growth pattern.

      We are extremely grateful to you for pointing out this problem. We quantitatively analyzed the size of organoids in the three groups. The area was found to be increasing over time, with the three groups growing the most vigorously in the CTRL group, followed by the SEC group and the WOI group, but the differences were not statistically significant. Relevant results have been added to Figure S1E (Line130-131). There were no significant differences in Ki67 expression of these organoids. Therefore, the three groups of organoids showed a similar growth pattern. We decided to delete the statement “Following hormonal stimulation, WOI organoids exhibited slower growth than SEC and CTRL organoids, while CTRL organoids maintained robust proliferative activity (Fig. 1B)”.

      Author response image 4.

      The dynamic changes of the area of organoids over time in the CTRL, SEC and WOI organoids.

      -  Line 126: Fourteen days of organoid treatment is a very long time. Growing organoids may already be dying which should be checked by CC3 staining to prove that organoids are still fully viable.

      Endometrial organoids are vigorous in proliferation and have a long survival period due to the presence of adult stem cells. To address your queries effectively, we conducted CC3 staining on the organoids treated for 14 days, revealing negligible expression levels (shown as below).

      Author response image 5.

      Figure note: The Ki67 and CC3 immunostaining on the organoids after 14-day hormone treatment.

      -  Line 128: Changes in hormone receptors should be supported by RT-qPCR data to be more convincing

      We agree with your suggestion. Here we supplemented the RT-PCR results of hormone receptors as follows (Figure S1D) (Line119-121). PAEP and PGR are associated with progesterone, and OLFM4 and EGR1 are associated with estrogen.

      -  1A: Are authors able to see and characterize decidualized stromal cells as indicated in the illustration?

      Upon the reviewer's inquiry, we carefully observed the morphology of stromal cells in hormone-treated organoids. Regrettably, the morphology of decidualized stromal cells was not ascertainable through light microscopy in our endometrial organoids.

      -  1C: Which treatment condition are the organoids in these images?

      This figure showed the bright-field morphology of the CTRL organoids, which is now noted in the Figure 1C legend.

      -  1D: PAS staining should be quantified to support the claims.

      We agree with your suggestion. The quantitative comparison of PAS staining was conducted in these three groups of organoids (Figure S1G) (Line142-143)

      -  1D: Where are the stromal cells in the model? There should be vimentin-positive cells outside of the glands.

      The figure 1D illustrates the outcomes of section staining, which owned limitation to displaying stromal cells around the gland. Considering the 3D structure of organoids, we conducted organoid clearing and staining, and observed stromal cells (marked by Vimentin) under light sheet microscope (shown as below). The stromal cells were also presented using this method in the original Figure 2B.

      Author response image 6.

      Exhibition of stromal cell marked by vimentin of CTRL organoid through whole-mount clearing, immunostaining and light sheet microscopy imaging. Nuclei were counterstained with DAPI. The arrowhead indicates stromal cells. Scale bar = 70 μm.

      Figure 2: Developing receptive endometrial organoids in vitro mimicking the implantation window endometrium.

      -  Line 142: CD44 is not an exclusive marker for immune cells. It has been shown to be expressed in glandular secretory epithelial cells (Fonseca et al., 2023). The authors also mention that CD44 is expressed in stromal cells (line 265). Staining for CD45 (or another immune-specific marker) is needed to demonstrate the presence of immune cells. 

      We appreciated your suggestions. We demonstrated the distribution of immune cells in organoids using the organoid clearing technique in combination with light-sheet microscopy imaging, using CD45 as a marker (Figure 2C).

      -  Line 144: What are the proportions of the immune cells? What is the variation between patient samples?

      We assessed the proportion of immune cells with the help of flow cytometry and analyzed the proportion of Macrophages and T cells in organoids derived from 8 patients. The proportion of WBC in organoids was about 3%~4% (Figure 2D), among which macrophages were less than 1% and T cells less than 2% (Figure S2E). There existed a very few patients with large heterogeneity, and the proportion of immune cells in most patients was

      relatively stable.

      -  Line 161: What is the endometrial receptivity test (ERT)? Not explained at all.

      Endometrial Receptivity Test (ERT) is a kind of gene analysis-based method for detecting endometrial receptivity, which combines high-throughput sequencing and machine learning to analyze the expression of endometrial receptivity-related genes, allowing for a relatively accurate assessment of endometrial receptivity. It is currently used in clinical practice to determine endometrial receptivity and guide personalized embryo transfer (Yanping Li et al., J Transl Med 2021). (line179-183)

      -  2A: The authors' dataset is compared to a published dataset. How were they combined? Were they merged, mapped on each other, or integrated? Were all cells employed from the published dataset or specific cell types? Much detail to evaluate the analysis is missing.

      We are very grateful for your comments.  

      (1) The four raw datasets (CTRL, SEC and WOI organoids, and mid-secretory endometrium) underwent batch correction and integration using Harmony. Subsequently, the integrated dataset underwent dimensionality reduction via  PCA. The soft k-means clustering algorithm was employed to address batch effects and clustering, utilizing a clustering parameter resolution of 0.5. Finally, the clustering results were visualized using tSNE based on the cell subpopulation classification. (“Methods” Line164-175)

      (2) The Figure 2A displayed comparison of glandular and luminal epithelium, secretory epithelium, LGR5 epithelium, EMT-derived stromal cells, ciliated epithelium, and glandular secretory epithelium (shown as Figure S2C~S2D) (Line150-154)

      - 2E: Please add the cell type names above the heatmaps to improve readability.

      Thanks to your suggestion, we have added the cell type names above the heatmaps.

      - 2G: The difference between the left and right graphs is not clear from the figure itself. Improve by adding a title and more explanation.

      Thanks for your careful review. We have added the title to the left and right graphs.

      Supplementary Figure 3 is referenced with Figure 2. Supplementary Figure 2 is referenced with Figure 3. The order needs to be changed.

      Thanks for your careful review. We have changed the order.

      - S3B: Typical markers for annotation of the different cell clusters are not included and therefore it is not convincing enough that annotations are correct. E.g. Epithelial markers (EPCAM, CDH1), Stromal cells (VIM, PDGFRA), SOX9+LGR5+ cells (SOX9, LGR5). How were the EMT-derived stromal cells designated? It is not clear from the data whether they are in fact EMT-derived or whether they show epithelial markers as well (stated in line 246).

      We deeply appreciate your suggestion. We provided more details to describe the cell clustering as the following. Single-cell transcriptomics analysis referred to CellMarker, PanglaoDB, Human Cell Atlas, Human Cell Landscape, and scRNASeqDB, and previous endometrium related studies. (W. Wang et al., Nat Med 2020, P. D. Harriet C. Fitzgerald et al., PNAS 2019, K. M. Thomas, M Rawlings et al., eLife 2021, L. Garcia-Alonso et al., Nat Genet 2021) 

      (1) SOX9+LGR5+ cells: SOX9 and LGR5 are both proliferative markers. SOX9 is expressed in all clusters dispersedly. LGR5 is mainly expressed in two clusters, one of which is stem derived epithelium, and the other cluster expresses LGR5 in a scattered manner. Refer to the markers of SOX9+LGR5+ cells, SOX9+LGR5- cells, and SOX9+ proliferative cells in 2021 Nature Genetics (L. Garcia-Alonso et al., Nat Genet 2021), the cells in this cluster expressed high levels of NUAK2, CNKSR3, FOS and LIF, which was consistent with the expression profiles of SOX9+LGR5+ cells and SOX9+ proliferative cells. However, considering that the number of cells expressing LGR5 was relatively small, this cluster of cells was renamed SOX9+ proliferative epithelium.

      Figure 3: Receptive endometrial organoids recapitulate WOI-associated biological characteristics. - Line 173-174: The WOI organoids should be compared in detail to the SEC organoids in addition to the CTRL organoids, to show that this WOI model and new hormonal treatment is providing better results compared to the SEC organoids and the results obtained in previous studies.

      Thanks for your suggestion. At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), which prompted us to continue exploring at the single-cell level.

      - Line 190: Quantification of pinopodes is required to claim that they are more densely arranged in WOI organoids. 

      - Line 190-191: Again, is there a difference in pinopode presence between the WOI and SEC organoids to show that the WOI organoids are really distinct and a better model?

      We agree with the reviewer’s suggestion and quantified the pinopodes. The CTRL, SEC and WOI organoids were found to have increasing numbers of pinopodes, with WOI organoid owning the most abundant pinopodes under electron microscope. (Figure 2H) (Line184-186)

      - Line 194: Also here, quantification of the glycogen particles is missing.

      We agree with your suggestion. We have quantified the area of glycogen particles under electron microscope in the CTRL, SEC and WOI organoids. It was found that WOI organoid had the most glycogen particles. (Figure 2H) (Line184-186)

      - 3C: There is no difference between SEC and WOI organoids condition for OLFM4 and PRA/B. What is the purpose then of adding extra hormones if no difference is present?

      The figure 3C indicated that there was no significant difference in OLFM4 and PRA/B level (reflecting estrogen and progesterone responsiveness) in SEC and WOI organoids at the organoids level. It is understandable because WOI organoids are induced further into the implantation window on the basis of the secretory phase (i.e., SEC organoids), and both are similar at the overall level of organoids. Based on this, we further explored the differences between WOI organoids and SEC organoids at the single-cell level.

      - 3G: A higher magnification is necessary to evaluate cilia staining. From these images, it seems like CTRL organoids also express acetyl-a-tubulin.

      Thanks for your suggestion. The figure has been enlarged and shown as below. The acetyl-a-tubulin of WOI organoids is different from that of CTRL organoids in morphology and expression level. The glands of WOI organoids have small green tips (expressing acetyl-α-tubulin) convex toward the lumen. WOI organoids expressed higher level of acetyl-α-tubulin than CTRL organoids. (Now replaced with Figure 3G in the revised draft).

      Figure 4: Structural cells construct WOI with functionally dynamic changes

      - Line 211: To which figure are these claims referring to?

      You should be referring to this sentence “In terms of energy metabolism, the WOI organoids exhibited upregulation of monocarboxylic acid and lipid metabolism, and hypoxia response”. Up-regulation of monocarboxylic acid and lipid metabolism in WOI organoids is reflected in Figure 3B, and up-regulation of hypoxia responses is reflected in Figure S3F.

      - In general, it should be stated in the text that CellPhoneDB is a useful tool to investigate ligandreceptor interactions, however, it only proposes potential interactions. To validate such interactions, stainings and functional assays are required.

      Thanks for your suggestion. The CellphoneDB was briefly introduced in the "Methods" section of "Supporting information" originally. Now it has been explained in the line 256-257 of main text.

      We agree that staining and functional assays are required to validate the ligand-receptor interactions. Therefore, we used the proximity ligation assay (PLA) to verify the trend of interaction. (Figure S2J, Line259-261, Line 277-279, Line 285-288)

      - Line 243: Please describe the process of EMT in the endometrium more specifically.

      EMT is a common and crucial biological event in the endometrium during the implantation window. During the EMT process, epithelial cells lose their epithelial characteristics while gaining migratory and invasive properties of fibroblasts.

      During the attachment and adhesion phases of embryo implantation, interaction mediated by trophoblastic factors (e.g. integrins) and maternal ECM factors (e.g. fibronectin) induce the eventual EMT in the trophectoderm. During the peri-implantation period, microRNAs, (e.g. miR429 and miR-126a-3p) which regulate EMT, are expressed in the maternal luminal epithelium to different degrees, mediating its transformation process as the blastocyst invades the maternal decidua. The epithelium of endometrium transforms to epithelioid stromal cells with increased migratory and invasive capacities through the EMT process. The decidual stromal cells migrate away from the implantation site, having acquired increased motility. (Line 265-267)

      - Lines 247-251 and 313-316: the claim that proliferative epithelium transforms into EMT derived stromal cells by pseudotime trajectory is too bold and must be underpinned by other means. Pseudotime analysis only suggests and is by definition biased since the first/originating population must be defined by the operator.

      In addition to pseudotime analysis based on monocle, RNA rate analysis based on scVelo is also used for cell evolution analysis. They can prove each other if both analyses indicate the transformation from proliferative epithelium to EMT-derived stromal cell. RNA rate analysis automatically determines the direction of differentiation, which can be used as evidence to determine the starting point of pseudotime analysis.

      RNA rate analysis showed that the EMT derived stromal cell was most closely connected to the proliferative epithelium. Besides, the pseudotime point plot inferred that the proliferative epithelium was the root cell. It can be mutually proved with pseudotime analysis that the transformation from proliferative epithelium to EMT-derived stromal cell.

      Author response image 7.

      RNA rate junction diagram (To infer intercellular connectivity)

      Author response image 8.

      Time differentiation of cells

      Discussion

      - Line 300-302: It would be interesting to investigate ATP production and IL8 release in the WOI organoids to validate with findings from in vivo.

      To answer this point of your interest, we purposely examined ATP production and IL8 release. It was found that WOI organoids indeed produced much more ATP and IL8 than CTRL and

      SEC organoids (Figure S3L) (Line323-324)

      - Line 313-316: Do the WOI organoids lose polarity and cell-to-cell junctions?

      Transcriptome sequencing revealed downregulation of cell adhesion and RHO GTPase signaling in WOI organoids (Figure 3B). Electron microscopy revealed that the cellular arrangement of WOI organoids was slightly looser than that of CTRL organoids, but the microvilli were still oriented toward the medial side of the glands and did not undergo polarity reversal (shown as below).

      Author response image 9.

      Electron micrograph of the CTRL (left), and WOI (right) endometrial organoid. Scale bar = 5 μm.  

      - Line 322: Where is the data that shows that 'a decreased abundance of immune cells', is observed?  

      A decreased abundance of immune cells was observed through single-cell transcriptome sequencing and flow cytometry. The number of immune cells was reduced in WOI organoids compared to CTRL organoids in single-cell sequencing results (Figure 4A). Besides, flow cytometry also showed that the percentage of WBCs in WOI organoids was lower than that in CTRL organoids (Figure S2F).  

      - Line 324: Elaborate more on how the immune cell composition differs from the endometrium.

      The differences of immune cell composition between organoids and endometrium were mainly reflected in the proportion of WBC, the proportion of immune cell subtypes and the changes of T cells after entering the implantation window.

      Firstly, the proportion of WBCs in organoids was lower than that in endometrium. Flow cytometry showed that the proportion of WBC in organoids was about 3%~4% (Figure 2D), but the proportion of WBCs in endometrium was about 8% (W. Wang et al., Nat Med 2020). Secondly, the proportions of T cells and macrophages in organoids were about 2%~3% and 1% (Figure 2D), respectively, but the proportions of lymphocytes and macrophages in endometrium were 7%~8% and 0.6%~0.7% (W. Wang et al., Nat Med 2020). Besides, after entering the implantation window, T cells in WOI organoids decreased (Figure S2F), while T cells in endometrium increased (W. Wang et al., Nat Med 2020). These three aspects have differences in vivo and in vitro. (Line347353)

      Material and Methods

      -  What are the concentrations of all medium components?

      Thanks to your suggestions. The concentrations of all medium components have now been refined in Table S1.

      -  Authors mention 10x while Smartseq2 is mentioned in Dataset S7?

      Thanks for your careful review. Single cell transcriptome sequencing in this study was done using 10X Genomics. Smartseq2 was used to sequence the transcriptome of a gland and its surrounding cells, which can be regarded as small bulk RNA sequencing. A small number of cells are utilized in Smartseq2 to construct a full-length mRNA library with enhanced transcript sequencing coverage, making it particularly well-suited for small-scale samples such as organoids.

      The data in Dataset S7 are acquired from small bulk RNA-seq with Smartseq2.  

      Reviewer #2 (Recommendations For The Authors):

      Q1: The theoretical choice of extra reagents added to the WOI organoids culture (PRL, hCG, and hPL) is theoretically justified, but not experimentally. On what previous studies, or performed experiments, are the choice of conditions used based?

      When selecting hormone formulations, multiple group comparisons were made. It was found that the number, area, and average intensity of organoids in these groups were similar over time. But the WOI organoids showed endometrial receptivity related gene expression profile, which highly expressed genes positively correlated with endometrial receptivity, and lowly expressed genes negatively correlated with receptivity, compared to the other hormone formulations (added to Figure S1E, S1F). Hormone dosage was primarily based on peri-pregnant maternal body or localized endometrium levels (Margherita Y. Turco et al., Nature Cell Biology 2017).

      Q2: Text in line 111 indicates that "stromal cells formed an extensive network", but vimentin fluorescence is not present on any image surrounding organoids in that figure. This assertion could only be supported by the subsequent results in Figure 2B. In addition, it is not indicated what kind of organoids have been used for these experiments

      The stromal cells arranged around the glands in the 3D structure (as shown in Figure 1C and Figure 2B), where bright-field high magnification photography, clearing staining of the organoids, and light microscopy imaging were used, respectively. However, there are many steps of fixation, embedding, staining and elution during the immunostaining of sections. It is difficult to preserve the arrangement and morphology of the stromal cells in the slice, so the stromal cells were not intentionally captured in the other images.  

      Figure 1C and Figure 2B are both CTRL organoids, which are now noted in the corresponding figure legend section.  

      Q3: It is not clear how glycogen secretion into the lumen is assessed in Figure 1D.

      Glycogen from the subnuclear region of the glandular cells gradually reaches the top of the cells, i.e., the supranuclear region, and is discharged into the glandular lumen as parietal plasma secretion. Glycogen-containing eosinophilic secretion can be seen in the glandular lumen in Figure1D.

      Q4: Assertions about differences in proliferation between groups are purely subjective; some kind of measurement and analysis would be necessary to be sure that there is differential proliferation based on Figure 1B.

      We are extremely grateful to you for pointing out this problem. We quantitatively analyzed the size of organoids in the three groups. The area was found to be increasing over time, with the three groups growing the most vigorously in the CTRL group, followed by the SEC group and the WOI group, but the differences were not statistically significant. Relevant results have been added to Figure S1E (Line130-131).

      Q5: For progesterone receptor expression analysis organoids are cultured for fourteen days. What is the basis for this change in culture time? 

      The choice of time point here is based on the secretary period of 14 days in the female menstrual cycle, when the endometrium is stimulated by estrogen and progesterone to maximized

      level.

      Q6: "n" number of individuals analysed through single-cell transcriptomics is not indicated.

      One patient's endometrium was simultaneously constructed into CTRL, SEC and WOI organoids, which were then subjected to single-cell transcriptome sequencing. This is described in the Supporting Information (Line 141-142).

      Q7: Where does the classification of EMT-derived stromal cells come from?

      EMT is a common and crucial biological event in the endometrium during the implantation window. During the EMT process, epithelial cells lose their epithelial characteristics while gaining migratory and invasive properties of fibroblasts.

      This cluster of cells expresses both epithelium markers CDH1 and EPCAM, and specifically expresses high levels of the EMT-related stromal cell markers AURKB, HJURP and UBE2C. During endometrial EMT, AURKB upregulates MMP2, VEGFA/Akt/mTOR and Wnt/β-catenin/Myc pathways to induce EMT (Zhen Wang et al., Cancer Manag Res 2020). HJURP also activates Wnt/β-catenin signaling to promote EMT (Y Wei et al., Eur Rev Med Pharmacol Sci 2019, Tianchi Chen et al., Int J Biol Sci 2019). UBE2C is upregulated by estrogen to promote EMT (Yan Liu et al., Mol Cancer Res 2020). Therefore, this cluster was defined as "EMT-derived stromal cells”.

      Q8: In the endometrial receptivity test (ERT), endometrium sample data matches with prereceptive endometrium and WOI organoids data matches with a receptive endometrium, but why there is no information about CTRL and SEC organoids?

      We performed ERT on these samples at a time when our hospital has a cooperative project with Yikon Genomics (Jiangsu, China). However, only endometrium and WOI organoids were sent for testing due to the limited quotas. Considering the end of cooperation and batch effect, no more CTRL and SEC organoids were tested. Moreover, the current ERT is a machine learning model based on the sequencing data of endometrium samples. But there are still differences in cellular composition between endometrial organoids and endometrium. Thus, the results need to be interpreted in conjunction with other results.

      Q9: When analysing the transcriptome and proteome, some comparisons are made between WOI vs CTRL and SEC, or just WOI vs CTRL. It would be interesting to have all the comparisons since the power of WOI organoids lies in their differences with SEC organoids.

      Thanks for your suggestion. At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), which prompted us to continue exploring at the single-cell level.

      Q10: Electron microscopy comparisons with respect to pinopods, cilia, and microvilli are only performed between WOI and CTRL. It would be interesting to check it with SEC.

      We now quantitatively compared the presence of various characteristic structure like microvilli, cilia, pinopodes and glycogen in the CTRL, SEC and WOI organoids. It was found that WOI organoid had longer microvilli and increased cilia, glycogen, and pinopodes (Figure 2H).

      Q11: Line 190 states that pinopods are arranged more densely in WOI organoids than in CTRL organoids. Seems to be a subjective observation. Is there an objective method to quantify this?

      We agree with the reviewer’s suggestion and quantified the pinopodes. The CTRL, SEC and WOI organoids were found to have increasing numbers of pinopodes, with WOI organoid owning the most abundant pinopodes. (Figure 2H) (Line184-186)

      Q12: Some characteristics are very similar between WOI and SEC organoids (such as the accumulation of secretory epithelium or decreased proliferative epithelium, the increased ciliated epithelium after hormonal treatment, or the presence of EMT-derived stromal cells). The authors should complement the discussion by objectively justifying the use of WOI versus SEC organoids. Would they be useful in more specific cases or at a general level when studying implementation?

      Thanks for your comments. WOI organoids are differentiated from SEC organoids towards the implantation window. Therefore, WOI organoids are suitable for studying periimplantation physiological changes or exploring pathological mechanisms. SEC organoids can be used when studying only a range of pathological problems such as endometrial secretory phase changes or hormone reactivity. (Line 365-368)

      Q13:ExM media is described in Table S1, but it does not include the concentration of the different reagents in the culture medium, which is the most interesting data about the ExM medium.

      Thanks to your suggestions. The concentrations of all medium components have now been refined in Table S1.

      Q14: It is not specified which organoid pass is used in each experiment. Is it always the same pass?

      Our experiments were conducted using P1~P3 generation endometrial organoids, as specified in the “Supporting Information” Line 54~55.

      Q15: As a protocol for freezing organoids is included in materials and methods, do the authors use freshly cultured organoids or do they cryopreserve them and thaw them for culturing?

      Thanks for your question. We used freshly cultured organoids in the manuscript. We listed the freezing protocol to illustrate that the constructed organoids can be frozen and recovered for special experimental needs and the establishment of sample banks.

      Q16: The most important point: Neither of the two studies that developed human endometrial organoids from tissue biopsies (Boretto et al. 2017 and Turco et al. 2017), observed stromal cell growth in culture. They disappeared between the first and second pass (as indicated by Turco et al. 2017). How do the authors justify the presence of stromal cells in their organoid culture if they rely on the protocols previously described by these research groups? If it is the case that they can only use the initial pass (freshly planted cells from endometrium), it does not make sense to include the freezing of the different passes in materials and methods, since the expansion capacity of the culture would be lost, which implies a major limitation of the model.

      Thanks for your question.  

      (1) We did not completely follow the protocols of these research groups. To maximize the recovery of both epithelial and stromal cells, we optimized key steps such as tissue digestion and cell strainer filtration. We shortened the digestion time to 20 minutes to protect cells from the digestion solution and retain some cell aggregates, which are beneficial for maintaining cell stemness and preserving stromal and immune cells cluster. The 40 μm filter membrane was used to isolate the endometrial cells, which may acquire both epithelial, and stromal cells.

      (2) Our experiments were conducted using P1~P3 generation of freshly constructed organoids. However, we also used recovered organoids when fresh endometrial samples were not available due to the COVID-19 epidemic. It was found that the organoids (e.g., P0~P5) still exhibited vigorous growth condition after recovery and could continue to be cultured by passaging (shown as below).

      The recovered organoids can be used for special experiments and biobank establishment.

      Author response image 10.

      The endometrial organoids of different passages were observed before cryopreservation and after recovery. Scale bar = 200 μm.

      Q17: It is not clear which organoids include Figure S2F. Does it include the three types of organoids or just WOI organoids?

      This circle diagram showed the functions of upregulated genes in the WOI group compared to CTRL group from combined transcriptome and proteome analysis, which has been labeled in the figure legend section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1)  Regarding the cell studies of human pediatric bone-derived osteoblast-like cells (HBO), the authors should provide a rationale for their selection of specific cell lines (15,16, 17, 19, 20, 23, 24) in this study. As for animal studies, could the authors clarify which cell lines were utilized in the murine in vivo experiments?

      We appreciate the opportunity to address this. To reduce confusion, we have numbered the patient primary cell lines used in these studies sequentially from 1 – 7. Additionally, we have added “HBO cell lines used for experiments were selected based on the ability of the primary cell line to proliferate and mineralize in culture” to the Methods section. 

      In vivo experiments: “HBO cell lines 2, 6 and 7 from separate individuals were selected for these experiments based on similar growth and passage characteristics.” This statement is included in the Methods section.

      (2)  In this study, the authors performed the murine in vivo experiments using both male and female mice. Could the author clarify if any difference was observed between male and female mice in the findings? This information would contribute to a more comprehensive understanding of the study.

      We agree and have added the following to the Results section: “There was no sex-based difference in regenerated bone volume.”

      (3)  Although the histological results showed an elevated collagen expression in mice treated with BMP2, JAG1, and JAG1 + DAPT compared to those treated with the cells alone, the differences among groups were subtle. The authors should consider the immunohistochemical (IHC) staining for collagen 1 on the samples, allowing for a quantitative assessment of collagen 1 expression.

      Thank you for this comment. The differences between BMP2, JAG1, and JAG1 + DAPT are indeed subtle. We have added Supplementary Figure 5, showing collagen staining of sections from the same FFPE blocks that were sectioned and stained with Masson Trichrome in Figure 2C. 

      Minor Comments:

      (4)  Please specify which cell lines are represented in the staining results shown in Fig.1A and Fig. 5A, respectively.

      In Fig 1A the representative images are of HBO2. Fig 5A representative images are of HBO7. We have added this information to the figure legends for these figures. 

      (5)  There appears to be a discrepancy in the specified size of the critical defect. The manuscript states that the size is 4mm, while Supplemental Figure 3 indicates 3.5mm.

      Thank you for this catch! Yes, it should be 4mm. This has been corrected in Supplementary Figure 3.

      (6)  The scale bar for Figure 2 C is missing.

      Scale bars have been added which also gave us an opportunity to brighten the images equally, allowing for better distinction between the different colors of the Masson Trichrome staining.

      (7)  In the methodological section 2.5 for JAG1 delivery, it would be helpful if the authors could review the initial dosage of JAG1 delivery to confirm if HBO cells were included or not, given that the MicroCT results indicate that all groups incorporated HBO cells. 

      We appreciate this suggestion. In response to another question, we have added Supplementary Figure 4 which includes an “Empty Defect” condition with no HBO cells, making the original method statement accurate.

      Reviewer #2 (Recommendations For The Authors):

      In the current study, using in vitro and in vivo models the authors clearly show that JAG1 can enhance osteogenesis and thus can be helpful in designing new therapeutic approaches in the field of bone regenerative research. The in vivo mouse CF model is very convincing and shows that JAG1 promotes osteogenesis via non-canonical signaling. Mechanistically it seems that JAG1 activates STAT5, AKT, P38, JNK, NF-ĸB, and p70 S6K. However, additional evidence is needed to convincingly conclude that all the non-canonical pathways activated via JAG1 converge at p70 S6K activation. The following concerns need to be addressed.

      (1) In Fig 1A: Even though the Jag1-Fc shows a very significant increase in HBO mineralization, there are no significant increases in cells in osteogenic media when compared to control growth media. Even though the different conditions were subjected to RNAseq analysis in the later figures, qPCR analysis of some osteogenic genes in Figure 1 might be helpful. 

      We appreciate the opportunity to explore this question further. We conducted mineralization experiments in triplicate and performed qRT-PCR, assessing for gene expression of 5 osteogenic genes: ALPL, BGLAP (osteocalcin), COL1A1, RUNX2, and SP7. Results are shown in Figure 1C and this text was added to Results: “Additionally, PCR analysis of HBO1 cells from a repeat experiment collected at days 7, 14, and 21 showed significantly increased expression of osteogenic genes with JAG1-bds stimulation (Figure 1C). ALPL was significantly expressed at Day 7, with a 3.5-fold increase (p=0.0004) compared to HBO1 cells grown in growth media. In contrast, significant expression levels of COL1A1 and BGLAP were observed at 14 days, with a 5.1-fold increase (p=0.0021) of COL1A1 and a 12.3-fold increase (0.0002) of BGLAP when compared to growth media conditions. Interestingly, while some mineralization is observed in the osteogenic media and Fc-bds

      (Figure 1A) conditions, there were no significant increases in osteogenic gene expression (Figure

      1C). Expression of RUNX2 and SP7 was not significantly altered across all conditions and time points (not shown).”

      (2) In Fig 2: even though not needed in respect to the hypothesis, was there any Control group without any cells or JAG1 beads? What were the changes in between that group and cells cells-only group?

      We have not observed differences between the “Empty Defect” group and the “Cells alone” group.

      We have addressed the reviewer’s comments by adding this comparison in Supplementary Figure 4.

      (3) Transcriptional profiling and ELISA (Fig 3 and 4) show upregulation of NF-ĸB signaling in response to JAG1. In the discussion, the authors have referenced a previous study showing NF-ĸB as prosurvival in human OB cells. However, based on many published reports, NF-ĸB activation has been shown to inhibit OB function. Does JAG1 regulate HBO cell survival via NF-ĸB activation?

      Experimenting using NF-ĸB inhibitor can be helpful to show that JAG1 mediates NF-ĸB activation is anabolic in this experimental setup.

      We thank the reviewer for this excellent suggestion. We are eager to explore this new direction for our research in a subsequent study. We have added this to our future directions. 

      (4) Fig 5: 

      (A)  Condition showing JAG1+ DAPT is needed to compare between JAG1 canonical and noncanonical signaling. 

      Thank you for pointing this out. We have added Supplementary Figure 6, which includes a dose response experiment for JAG1 + DAPT.

      (B)  S6K18 alone seems to be increasing OB mineralization. Is that statistically significant?  

      No, and we have added the statistical analysis for S6K-18 to Figure 5B.

      (C)  Fc alone condition seems to have a very significant increase in OB mineralization. Does Fc alone upregulate OB function? 

      We do see some upregulation of mineralization with Fc in vitro, which we also observed in our previous studies with mouse neural crest cells, but we have not found it to be osteogenic in vivo. We have added a statement to this effect, with references. Additionally, osteogenic gene expression was not upregulated in our in vitro mineralization experiments with Fc.  See Revised Figure 1.

      (D)  Although overall quantification shows that S6K18 partially inhibits HBO mineralization, the representative images do not represent the quantification. Transcriptional analysis (qPCR) is required to validate these findings.

      We performed qRT-PCR on cells from a repeat mineralization assay, collecting cells at 9, 14, and 21 days. We have added the following to the Results:” While inhibition of NOTCH and p70 S6K decreased mineralization in our mineralization assay, there are no statistically significant changes in gene expression for ALPL, COL1A1, or BGLAP (Supplementary Figure 7). These results suggest that the HBO cells phenotypes are maturing into osteocytes and that inhibiting p70 S6K hinders the cellular ability to mineralize but not the cell phenotype progression.”

      (5) Finally, to convincingly conclude the data from Fig 5, the mouse CF model can be helpful to support the authors' claim that JAG1 acts via p70 S6K.

      Thank you for this feedback. We have modified our conclusions to reflect that p70 S6K is one of the non-canonical pathways that JAG1 may be activating in bone regeneration.

      Thank you very much for your consideration of our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, proteomics analysis of the plasma of human subjects that underwent an exercise training regime consisting of a combination of endurance and resistance exercise led to the identification of several proteins that were responsive to exercise training. Confirming previous studies, many exercise-responsive secreted proteins were found to be involved in the extra-cellular matrix. The protein CD300LG was singled out as a potential novel exercise biomarker and the subject of numerous follow-up analyses. The levels of CD300LG were correlated with insulin sensitivity. The analysis of various open-source datasets led to the tentative suggestion that CD300LG might be connected with angiogenesis, liver fat, and insulin sensitivity. CD300LG was found to be most highly expressed in subcutaneous adipose tissue and specifically in venular endothelial cells. In a subset of subjects from the UK Biobank, serum CD300LG levels were positively associated with several measures of physical activity - particularly vigorous activity. In addition, serum CD300LG levels were negatively associated with glucose levels and type 2 diabetes. Genetic studies hinted at these associations possibly being causal. Mice carrying alterations in the CD300LG gene displayed impaired glucose tolerance, but no change in fasting glucose and insulin. Whether the production of CD300LG is changed in the mutant mice is unclear.

      Strengths:

      The specific proteomics approach conducted to identify novel proteins impacted by exercise training is new. The authors are resourceful in the exploitation of existing datasets to gain additional information on CD300LG.

      Weaknesses:

      While the analyses of multiple open-source datasets are necessary and useful, they lead to relatively unspecific correlative data that collectively insufficiently advance our knowledge of CD300LG and merely represent the starting point for more detailed investigations. Additional more targeted experiments of CD300LG are necessary to gain a better understanding of the role of CD300LG and the mechanism by which exercise training may influence CD300LG levels. One should also be careful to rely on external data for such delicate experiments as mouse phenotyping. Can the authors vouch for the quality of the data collected. 

      Thank you for the valuable feedback on our manuscript. We recognize concerns about the specificity of correlative data from open-source datasets and the limitations it presents for understanding CD300LG's role. To address this, we have expanded the manuscript with a paragraph in the discussion regarding the need of targeted experiments confirm CD300LG’s functions and relationship with glucose metabolism. We also emphazise caution regarding external data reliance and we acknowledge the need for generating primary data including direct phenotyping of mice with CD300LG gene alterations to better understand its regulatory mechanisms and effects on glucose tolerance. Please see lines 446-456.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript from Lee-Odegard et al reports proteomic profiling of exercise plasma in humans, leading to the discovery of CD300LG as a secreted exercise-inducible plasma protein. Correlational studies show associations of CD300LG with glycemic traits. Lastly, the authors query available public data from CD300LG-KO mice to establish a causal role for CD300LG as a potential link between exercise and glucose metabolism. However, the strengths of this manuscript were balanced by the moderate to major weaknesses. Therefore in my opinion, while this is an interesting study, the conclusions remain preliminary and are not fully supported by the experiments shown so far.

      Strengths:

      (1) Data from a well-phenotyped human cohort showing exercise-inducible increases in CD300LG.

      (2) Associations between CD300LG and glucose and other cardiometabolic traits in humans, that have not previously been reported.

      (3) Correlation to CD300LG mRNA levels in adipose provides additional evidence for exercise-inducible increases in CD300LG.

      Weaknesses:

      (1) CD300LG is by sequence a single-pass transmembrane protein that is exclusively localized to the plasma membrane. How CD300LG can be secreted remains a mystery. More evidence should be provided to understand the molecular nature of circulating CD300LG. Is it full-length? Is there a cleaved fragment? Where is the epitope where the o-link is binding to CD300LG? Does transfection of CD300LG to cells in vitro result in secreted CD300LG?

      (2) There is a growing recognition of specificity issues with both the O-link and somalogic platforms. Therefore it is critical that the authors use antibodies, targeted mass spectrometry, or some other methods to validate that CD300LG really is increased instead of just relying on the O-link data.

      (3) It is insufficient simply to query the IMPC phenotyping data for CD300LG; the authors should obtain the animals and reproduce or determine the glucose phenotypes in their own hands. In addition, this would allow the investigators to answer key questions like the phenotype of these animals after a GTT, whether glucose production or glucose uptake is affected, whether insulin secretion in response to glucose is normal, effects of high-fat diet, and other standard mouse metabolic phenotyping assays.

      (4) I was unable to find the time point at which plasma was collected at the 12-week time point. Was it immediately after the last bout of exercise (an acute response) or after some time after the training protocol (trained state)?

      We acknowledge the importance of understanding the molecular form of CD300LG in circulation. We have expanded the discussion with a paragraph regarding the need of follow-up experiments on whether circulating CD300LG is full-length or a cleaved fragment, to identify the epitope for O-link binding, and assess CD300LG secretion in vitro through transfection experiments. We also discuss the need of targeted mass spectrometry and antibody-based validation of O-link measurements of CD300LG, and the need for more validation experiments on CD300LG-deficient mice. Please see lines 446-456.

      The plasma collected post-intervention is in a state that reflects the new baseline trained condition of the subjects, 3 days after the last exercise session during the intervention. We have clarified this in our manuscript. The information is updated in line 491-493.

      Reviewer #1 (Recommendations For The Authors):

      In the present form, the paper raises interest in the potential role of CD300LG in the response to exercise training but unfortunately does not provide clear answers. The authors should focus their efforts on firmly validating the status of CD300LG as an exercise biomarker in humans and carefully examine the function of CD300LG through mechanistic and animal-based studies.

      The authors are encouraged to acquire CD300LG-deficient mice and perform specific experiments to validate hypotheses forthcoming from the analysis of the open-source datasets. In addition, it needs to be validated that the cd300lgtm1a(KOMP)Wtsi mice are actually deficient in CD300LG. It is not uncommon that Tm1a mice have (almost) normal expression of the targeted gene.

      We have now revised the manuscript and added a new section to the discussion regarding the limitations with open-source data, cd300lgtm1a(KOMP)Wtsi mice and the need for more validation experiments on CD300LG-deficient mice. Please see lines 446-456.

      The value of the correlative data presented in Figure 5 is rather limited. The same can be argued for the data presented in Supplementary Figure 2. If CD300LG is expressed in endothelial cells, it stands to reason that its expression is correlated with angiogenesis. Hence, this observation does not really carry any additional value.

      We agree that correlations cannot imply causality. However, similar patterns were observed in several tissues and across different data sets, which at least suggest a role CD300LG related to angiogesis. We have included a section in the discussion were we clarify that our observations should only be regarded as indications and that follow-up studies are needed to confirm any causal role for CD300LG on angiogenesis/oxidativ capacity. Please see lines 446-456.

      Figure 6 may be better accommodated in the supplement.

      Figure 6 is now moved to the supplement.

      Figure 3A and B are a bit awkward. The description "no overlap" is confusing. Isn't it more accurate to say "no enrichment" or "no over-representation"? There will always be some overlap with certain pathways. However, there may be no enrichment. Furthermore, the use of arrows to indicate No overlap is visually not very appealing. Maybe the numbers can be given a specific color?

      We have now removed the arrows and text, and rather stated in the text that there were no enrichements other than for the proteins down-regulated in the overweight group.

      The description of the figure legend of figure 5E-H is incomplete.

      The description is now completed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors attempt to fully characterize the immunoglobulin (Ig) heavy (H) chain repertoire of tumor-infiltrating B cells from three different cancer types by identifying the IgH repertoire overlap between these, their corresponding draining lymph nodes (DLNs), and peripheral B cells. The authors claim that B cells from tumors and DLNs have a closer IgH profile than those in peripheral blood and that DLNs are differentially involved with tumor B cells. The claim that tumor-resident B cells are more immature and less specific is made based on the characteristics of the CDR-H3 they express.

      Strengths:

      The authors show great expertise in developing in-house bioinformatics pipelines, as well as using tools developed by others, to explore the IgH repertoire expressed by B cells as a means of better characterizing tumor-associated B cells for the future generation of tumor-reactive antibodies as a therapy.

      Weaknesses:

      This paper needs major editing, both of the text and the figures, because as it stands it is convoluted and extremely difficult to follow. The conclusions reached are often not obvious from the figures themselves. Sufficient a priori details describing the framework for their analyses are not provided, making the outcome of their results questionable and leaving the reader wondering whether the findings are on solid ground.

      The authors are encouraged to explain in more detail the premises used in their algorithms, as well as the criteria they follow to define clonotypes, clonal groups, and clonal lineages, which are currently poorly defined and are crucial elements that may influence their results and conclusions.

      In response to this comment, we significantly expanded the paragraph dedicated to the tumor and non-tumor repertoire overlap and isotype composition. The following sections were added:

      First, we characterized the relative similarity of IGH repertoires derived from tumors, DLN, and PBMC on the individual CDR-H3 clonotype level. We define clonotype as an instance with an identical CDR-H3 nucleotide sequence  and identical V- and J- segment attribution (isotype attribution may be different). Unlike other authors, here we do not pool together similar CDR-H3 sequences to account for hypermutation. (Hypermutation analysis is done separately and defined as clonal group analysis. )

      As overlap metrics are dependent on overall repertoire richness, we normalized the comparison using the same number of top most frequent clonotypes of each isotype from each sample (N = 109). Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point.

      We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of lymph nodes than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      Having excluded the IGHD gene segment from some of their analyses (at least those related to clonal lineage inference and phylogenetic trees), it is not well explained which region of CDR-H3 is responsible for the charge, interaction strength, and Kidera factors, since in some cases the authors mention that the central part of CDR-H3 consists of five amino acids and in others of seven amino acids.

      We considered different ways of calculating amino acid properties of CDR3 and used different parameters for sample-average and individual-sequence CDR3s. Now plots for Fig S6 C are updated  for consistency and the parameters depicted there are now calculated using 5 central amino acids, as in other sections.

      How can the authors justify that the threshold for CDR-H3 identity varies according to individual patient data? 

      Ideal similarity threshold may depend on several factors, such as sampling, sequencing depth etc. For example, imagine a sample picking up 100% of the clonal lineage sequences which differ only 1 amino acid from each other, and a worse quality sample/sequencing picking up only every other sequence. Obviously, the minimal threshold required to accumulate these into a cluster/clonal group  would be different for these two cases (1aa for the former, and ~2 aa for the latter for single-linkage clustering). Or, in other words, the more the sequencing depth, the more dense the clusters will be. The method of individual threshold tailoring relies on the following: https://changeo.readthedocs.io/en/latest/examples/cloning.html

      Although individual kidera factors that are significant in the context of our analysis are described in the text one by one on their first appearance, we now also added a sentence to describe Kidera factor analysis in general (page 8):

      Kidera factors are a set of scores which quantify physicochemical properties of protein sequences (Nakai et al. 1988). 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques.

      Throughout the analyses, the reasons for choosing one type of cancer over another sometimes seem subjective and are not well justified in the text.

      Whenever possible, we pooled all patients with all cancer types together, because the number of available samples did not allow us to draw any significant conclusions comparing between individual cancer types. When analyzing and showing individual patient data, we also did not attempt to depict any cancer-type-specific findings, but it is inevitable that we name a specific cancer type when labelling a sample coming from a specific tumor.

      Overall, the narrative is fragmented. There is a lack of well-defined conclusions at the end of the results subheadings.

      In addition to the described above, a conclusion was added to the paragraph describing hypermutation analysis:

      IGHG clonotypes from lung cancer samples show higher number of hypermutations, possibly reflecting high mutational load found in lung cancer tissue. For melanoma, another cancer known for high mutational load, no statistically significant difference was found. This may be due to higher variance between melanoma samples, which hinders the analysis, or due to the small sample size.

      The exact same paragraph is repeated twice in the results section.

      Corrected.

      The authors have also failed to synchronise the actual number of main figures with the text, and some panels are included in the main figures that are neither described nor mentioned in the text  (Venn diagram Fig. 2A and phylogenetic tree Fig. 5D). Overall, the manuscript appears to have been rushed and not thoroughly read before submission.

      Corrected.

      Reviewers are forced to wade through, unravel, and validate poorly explained algorithms in order to understand the authors' often bold conclusions.

      We hope that the aforementioned additions to the text and also addition to the Figure 1 make the narrative more easily understandable.

      Reviewer #2 (Public Review):

      Summary:

      The authors sampled the B cell receptor repertoires of Cancers, their draining lymph nodes, and blood. They characterized the clonal makeup of all B cells sampled and then analyzed these clones to identify clonal overlap between tissues and clonal activation as expressed by their mutation level and CDR3 amino acid characteristics and length. They conclude that B cell clones from the Tumor interact more with their draining lymph node than with the blood and that there is less mutation/expansion/activation of B cell clones in Tumors. These conclusions are interesting but hard to verify due to the under-sampling and short sequencing reads as well as confusion as to when analysis is across all individuals or of select individuals.

      Strengths:

      The main strength of their analysis is that they take into account multiple characteristics of clonal expansion and activation and their different modes of visualization, especially of clonal expansion and overlap. The triangle plots once one gets used to them are very nice.

      Weaknesses:

      The data used appears inadequate for the conclusions reached. The authors' sample size of B cells is small and they do not address how it could be sufficient. At such low sampling rates, compounded by the plasmablast bias they mention, it is unclear if the overlap trends they observe show real trends. Analyzing only top clones by size does not solve this issue. As it could be that the top 100 clones of one tissue are much bigger than those of another and that all overlap trends are simply because the clones are bigger in one tissue or the other. i.e there is equal overlap of clones with blood but blood is not sufficiently sampled given its greater diversity and smaller clones.

      Regarding the number of clonotypes to be taken into account,  we were limited by the B cell infiltration of tumor samples and our ability to capture their repertoire. However, we use technical replicates on the level of cell suspension to ensure that at least top clonotypes are consistently sampled. So, this is how the data should be interpreted - as describing the most abundant clones in the repertoire (which also may be considered the most functionally relevant in case of tumor infiltrating lymphocytes).

      To analyze the repertoire overlap, we generally use the F2 metric that takes clone size into account - because we think that clone size is an important functional factor. However, we have now added the description of using D metric (does not include clone frequency as a parameter) - which shows exactly the same trend as F2 metric. So, both F2 and D overlap metrics support our conclusion of higher overlap between tumor and LN.

      The following text was added:

      We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of lymph nodes than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      All in all, of course, the deeper the better, but given the data we were able to generate from the samples, this was the best approach to normalization that could be used.

      Similarly, the read length (150bp X2) is too short, missing FWR1 and CDR1 and often parts of FWR2 if CDR3 is long. As the authors themselves note (and as was shown in (Zhang 2015 - PMC4811607) this makes mutation analysis difficult.

      Indeed, we are aware of this problem, and therefore only a small part of the manuscript is dedicated to the hypermutation analysis. However, as the CDR-H3 region is the most mutated part, we still can capture significant diversity of mutations. To address the question of applicability of our data for the hypermutation phylogeny analysis, we compare the distribution of physico-chemical properties along the trees of hypermutation using the 150+150 and 300+300 data from the same donor and the same set of samples. The main conclusion is that neither for long, nor for short datasets could any correlation of physicochemical properties of the CDR-H3 region with the rank of the clonotype on the tree be found.  

      It also makes the identification of V genes and thus clonal identification ambiguous. This issue becomes especially egregious when clones are mutated.

      Again, this would be important for clonotype phylogeny analysis. However, for the simple questions that we address with our clonal group analysis, such as clonal group overlap between tissues etc, we consider this data acceptable, because if any mislabelling of V segment occurs, it is a) rare and b) is equally frequent in all types of samples. Therefore, any conclusions made are still valid despite this technical drawback.

      To directly address the question of mislabelling of V-genes in our data, we looked at the average number of different  V-genes attributed to the same nucleotide sequence of CDR-H3 region in the short (150+150) and long (300+300) datasets from the same donor. Indeed, some ambiguity of V-gene labelling is observed (see below), but we think that it is unlikely to influence any of our cautious conclusions.

      Author response image 1.

      Finally, it is not completely clear when the analysis is of single individuals or across all individuals. If it is the former the authors did not explain how they chose the individuals analyzed and if the latter then it is not clear from the figures which measurements belong to which individual (i.e they are mixing measurements from different people).

      We addressed this issue by adding a comment to each figure caption, describing whether a particular figure or panel describes individual or pooled data, and also whether the analysis is done on individual clonotype or clonal group level.

      Also, in case pooled data were used, we added the number of patients that was pooled for a particular type of analysis. This number differs from one type of analysis to the other, because not all the patients had a complete set of tissues, and also not all samples passed a quality check for a particular analysis.

      Here are the numbers listed:

      Fig 2A: N=6 (we were only considering those who had all three tissues)

      Fig 2C, N=14 (all)

      2D: N=14 (all)

      2E N=7 (have both tum and PBMC).

      2F N=9 (have both tum and PBMC).

      2G N=9 (have both tum and PBMC)

      2H N=7 (have both tum and LN)

      3A N=14 (all)

      3B N=11 (only those with tumor)

      3E - N=14

      7F N=11 (all that have tumor)

      Reviewer #3 (Public Review):

      In multiple cancers, the key roles of B cells are emerging in the tumor microenvironment (TME). The authors of this study appropriately introduce that B cells are relatively under-characterised in the TME and argue correctly that it is not known how the B cell receptor (BCR) repertoires across tumors, lymph nodes, and peripheral blood relate. The authors therefore supply a potentially useful study evaluating the tumor, lymph node, and peripheral blood BCR repertoires and site-to-site as well as intra-site relationships. The authors employ sophisticated analysis techniques, although the description of the methods is incomplete. Among other interesting observations, the authors argue that the tumor BCR repertoire is more closely related to that of draining lymph node (dLN) than the peripheral blood in terms of clonal and isotype composition. Furthermore, the author's findings suggest that tumor-infiltrating B cells (TIL-B) exhibit a less mature and less specific BCR repertoire compared with circulating B cells. Overall, this is a potentially useful work that would be of interest to both medical and computational biologists working across cancer. However, there are aspects of the work that would have benefitted from further analysis and areas of the manuscript that could be written more clearly and proofread in further detail.

      Major Strengths:

      (1) The authors provide a unique analysis of BCR repertoires across tumor, dLN, and peripheral blood. The work provides useful insights into inter- and intra-site BCR repertoire heterogeneity. While patient-to-patient variation is expected, the findings with regard to intra-tumor and intra-dLN heterogeneity with the use of fragments from the same tissue are of importance, contribute to the understanding of the TME, and will inform future study design.

      (2) A particular strength of the study is the detailed CDR3 physicochemical properties analysis which leads the authors to observations that suggest a less-specific BCR repertoire of TIL-B compared to circulating B cells.

      Major Weaknesses:

      The study would have benefitted from a deeper biological interpretation of the data. While given the low number of patients one can plausibly understand a reluctance to speculate about clinical details, there is limited discussion about what may contribute to observed heterogeneity.

      We indeed do not want to overinterpret our data, especially where it comes to the difference between types of cancer. On the other hand, extracting similar patterns between different cancer types allows to pinpoint mechanisms that are more general and do not depend on cancer type. As for the potential source of intratumoral heterogeneity that we observe, we think that it may be coming from the selective sampling of tertiary lymphoid structures. We include IHC data for TLS detection in the supplementary Fig.5.  Also, tumor mutation clonality may correlate with differential antibody response (i.e. different IGH clonotypes developing to recognize different antigens) – as has been previously described for TCRs by the lab of B.Chain in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6890490/.

      For example, for the analysis of three lymph nodes taken per patient which were examined for inter-LN heterogeneity, there is a lack of information regarding these lymph nodes.

      Unfortunately no clinical information about the lymph nodes was available.

      'LN3' is deemed as exhibiting the most repertoire overlap with the tumor but there is no discussion as to why this may be the case.

      The following phrases describes this in the “LN-to-LN heterogeneity in colorectal cancer” paragraph:

      Similarly, an unequal interaction of tumors with DLNs was observed at the level of hypermutating clonal groups.

      Functionally, this may again indicate that within a group of DLNs, nodes are unequal in terms of access to tumor antigens, and this inequality shapes the BCR repertoires within these lymph nodes.

      (2) At times the manuscript is difficult to follow. In particular, the 'Intra-LN heterogeneity' section follows the 'LN-LN heterogeneity in colorectal cancer' section and compares the overlap of LN fragments (LN11, LN21, LN31) with the tumor in two separate patients (Fig 6A). In the previous section (LN-LN), LN11, LN21, LN31 are names given to separate lymph nodes from the same patient. The fragments are referred to as 'LN2' and the nodes in the previous section are referred to similarly. This conflation of naming for nodes and fragments is confusing.

      We corrected this.

      (3) There is a duplicated paragraph in 'Short vs long trees' and the following section 'Productive involvement in hypermutation lineages depends on CDR3 characteristics.

      Corrected.

      Reviewer #1 (Recommendations For The Authors):

      - Figures:

      Figure 1A lacks resolution

      Corrected

      Figure 2A, Venn diagram: What do the colors indicate?

      Corrected

      Figure 5D, why include this tree when there is no mention of it in the text?

      Described

      Figures 8, 9, and 10 are not to be found. One should not have to figure out that they became supplementary in the end.

      Corrected

      Regarding the physicochemical properties of CDR-H3, what do the authors mean by "the central part"? Do the authors refer to the CDR-H3 loop, and if so, how is that defined when the IGHD gene segment is excluded from the analyses? Is it 5 amino acids (Productive involvement in hypermutating lineages depends on CDR3 characteristics, Page 21/39 in merged document) and (CDR3 properties, Page 8/39 in merged document), or 7 amino acids (Short vs long trees phylogeny analysis, Page 19/39 in merged document)? Please clarify.  

      We considered different ways of calculating amino acid properties of CDR3 and used different parameters for sample-average and individual-sequence CDR3s. Now plots for Fig S6 C are updated for consistency. IGHD segment was not excluded from the analysis. The reviewer might be confused by our description of phylogenetic inference, when an artificial outgroup with D segment deleted is added to the clonal group to facilitate the inference process. All other sequences were analyzed in their original form with the D segment. This way, we could avoid biases in phylogeny introduced by misassignment of D gene germline to the outgroup.

      What was the threshold for CDR-H3 identity in their analyses? How can the authors justify that this value changes according to individual patient datasets? (Materials & methods, Clonal lineage inference Page 29/39 in merged document).

      As described earlier, ideal similarity threshold may depend on several factors, such as sampling, sequencing depth etc. For example, imagine a sample picking up 100% of the clonal lineage sequences which differ only 1 amino acid from each other, and a worse quality sample/sequencing picking up only every other sequence. Obviously, the minimal threshold required to accumulate these into a clonotype would be different for these two cases (1aa for the former, and ~2 aa for the latter for single-linkage clustering). The method of individual threshold tailoring relies on this: https://changeo.readthedocs.io/en/latest/examples/cloning.html

      What is the difference between tumor-induced and tumor-infiltrating B cells? How can the authors discriminate between the two? Page 6/39 in the merged document.

      corrected to tumor-infiltrating

      "Added nucleotides" meaning N additions? Page 3/39 in the merged document.

      yes

      How many cancer patients were enrolled? 17 or 14(Materials & methods page 27/39 in the merged document)? Please clarify.   

      In the current project 14 patients were enrolled. The appropriate changes have been introduced in the final text. Supplementary table 2 has been added with the patient data.

      Abbreviations are used without full descriptions.

      According to reviewer’s recommendation, a list of abbreviations was added in the manuscript, and also full descriptions were added in the text upon first mentioning of the term.

      Use either CDR3 or CDR-H3

      We corrected the text to use CDR-H3 abbreviation throughout the text.

      Reviewer #2 (Recommendations For The Authors):

      I would like to start by apologizing for the time it took me to review.

      As I mentioned above there are issues with the clonal sampling of the sequencing length and the statistics in this paper. From reading the paper I am not sure if they are fixable but there are some things that could be tried.

      (1) The authors mention the diversity of their individual analysis - 17 individuals across 3 cancer types, but do not then systematically show us how the different things they measure track across the different individuals and cancer types. it is possible that some trends would be more convincing if we saw them happening again and again across all individuals. But, as I said above, the authors do not identify individuals clearly across all their types of analysis nor do they explain why sometimes they show analysis of specific individuals.

      For overlap analysis (Fig. 2 except panel B), CDR3 properties analysis (Fig. 3, Fig. S7), clonal group analysis (Fig. 4) we used pooled data on all cancers, unless it is indicated otherwise on the panel. For overlap analysis, we used Cytoscape graph (Fig. 2B) for one patient, mp3, to illustrate the findings that were made on pooled data. For other types of analysis, such as overlap between individual lymph nodes, or tumor fragments (Fig. 5, 6, 7 except panel F) pooled analysis is not possible due to the individual nature of the processes in question.

      (2) The authors do not address how lacking their sampling is nor the distribution of clone sizes in different tissues/ individuals/ subsets. Without such a discussion it is not clear how tenuous or convincing their conclusions are.

      (3) The short sequencing lengths limit the ability to exactly identify V and thus the germline root of clones, whose positions are mutated and clonal association of sequences. The authors appear to be aware of this as they often use the most common ancestor as the start of their analysis... however, again there are inconsistencies that are not clearly described in the text. in creating trees with change they defined roots as the putative germline and at least in most cases also in clone association although in some analyses potentially similar clones were collapsed into clonotypes. Again it is not clear when one method was used or the other and how the choice was made what to choose.

      Here we can only state that we consistently used the approach described in the Methods section, which was the following:

      First, the repertoires were clustered into clonal lineages using the criteria described in “Methods: Clonal lineage inference” Assuming that each clonotype sequence in the clonal lineage originated from the same ancestor, we try to recover the phylogeny. Please note that we refer to the individual BCR sequences as “clonotypes”, and to a group of clonotypes that presumably share a common ancestor - as “clonal lineage” or “clonal group”.

      The phylogeny of B-cell hypermutations was inferred for each clonal lineage of size five or more using the maximum likelihood method and the GTR GAMMA nucleotide substitution model. To find the most recent common ancestor (MRCA) or “root” of the tree, we used an artificial outgroup constructed as a conjugate of germline segments V and J defined by MIXCR and added it to the clonal lineage. The D segment was excluded from the outgroup formation, as there was insufficient confidence in the germline annotations due to its short length and high level of mutations. The rest of the clonotypes were still analyzed in their original form with D segment in place. Deleting D segment from the outgroup simply eliminates the risk of biasing the phylogeny by missasigning D segment germline sequence to the outgroup. The MUSCLE tool was used for multiple sequence alignment and RAxML software was used to build and root phylogenetic trees.

      (4) Beyond the statistical issues mentioned above: the unclear selection of individual examples for comparison and significance testing, the mixing of individuals and cancer types without clear identification, etc. there is in general a lack of coherence in the statistical analysis performed. specifically:

      (a) the authors should choose one cutoff for significance (0.01 for instance) and then just mention when things are significant and when not. There is no need and it is confusing to add the p-value for every comparison. P-values are not good measures of effect size.

      We corrected the figures and left p-values only where they are below significance threshold.

      (b) the Bonferroni correction used is not well characterized. For an alpha of 0.01 in Figures 3 C and D how many tests were performed?

      The number of tests performed that was used for Bonferroni-Holm correction equals the number of comparisons on the heatmap which makes it 39 for each heatmap on Fig 3C and 13 for Fig 3D.

      Finally some minor issues -

      (1) Not all acronyms are described, for instance, TME and TIL. The first time any acronym is used it should be spelled out.  -> Katya B- список сокращений

      (2) The figure captions are not all there...

      (a) there is no caption for Figure 3E.

      corrected

      (b) there are Figure 7 F and G panels but no Figure 7E panel and Figure F is described after Figure G.

      corrected

      (3) A few problems with wording -

      (a) bottom paragraph of page 3 - instead of :

      "different lymph nodes from one draining lymph node pool may be more or less involved"

      Corrected to "different lymph nodes from one draining lymph node pool may be differentially involved"

      (b) figure caption for figure 3a: instead of:

      "CDR3 are on average significantly higher in tumor"

      Corrected to "CDR3 are on average significantly longer in tumor"

      Reviewer #3 (Recommendations For The Authors):

      - FIG1A - Suggest expanding the legend to include more information on the computational analyses.

      added

      - PAGE SIX: Suggest adding a table or some text on patient characteristics. Numbers of unique clonotypes per sample etc. Are there differences in age/sex that need to be considered? Some clonotype information is available in S1 but some summary and statistics would be appreciated.

      Added patient information as Supplementary table 2.

      - PAGE SIX: F2 Metric, suggestion to explain why this was used vs. other metrics.

      We expanded the following paragraph to include information about F2 metric and D metric, and the reason why we are using F2.

      Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point. We used the repertoire overlap metric F2 (Сlonotype-wise sum of geometric mean frequencies of overlapping clonotypes), which accounts for both the number and frequency of overlapping clonotypes (Fig. 2A). As expected, significantly lower overlaps were observed between the IGH repertoires of peripheral blood and tumors compared to LN/tumor overlaps. The LN/PBMC overlap also tended to be lower, but the difference was not statistically significant. We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of tumor-draining LNs than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      - PAGE SIX: Make clear in the text that mp3 is a patient.

      Added “melanoma patient mp3”

      - PAGE EIGHT: Suggest explaining kidera factors at first use - not all readers will know what they are.

      We expanded the following paragraph to add more information about Kidera factors:

      To explore CDR-H3 physicochemical properties, we calculated the mean charge, hydropathy, predicted interaction strength, and Kidera factors 1 - 9 (kf1-kf9) for five central amino acids of the CDR-H3 region for the 100 most frequent clonotypes of each sample using VDJtools. Kidera factors are a set of scores which quantify physicochemical properties of protein sequences 61. 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques, to yield 9 factors which are used to quantitatively characterize physicochemical properties of amino acid sequences.

      - Fig 5D is not referred to.

      Corrected

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      Kroeg et al. describe a novel method for 2D culture human induced pluripotent stem cells (hiPSCs) to form cortical tissue in a multiwell format. The method claims to offer a significant advancement over existing developmental models. Their approach allows them to generate cultures with precise, reproducible dimensions and structure with a single rosette; consistent geometry; incorporating multiple neuronal and glial cell types (cellular diversity); avoiding the necrotic core (often seen in free-floating models due to limited nutrient and oxygen diffusion). The researchers demonstrate the method's capacity for long-term culture, exceeding ten months, and show the formation of mature dendritic spines and considerable neuronal activity. The method aims to tackle multiple key problems of in vitro neural cultures: reproducibility, diversity, topological consistency, and electrophysiological activity. The authors suggest their potential in high-throughput screening and neurotoxicological studies.

      Strengths: 

      The main advances in the paper seem to be: The culture developed by the authors appears to have optimal conditions for neural differentiation, lineage diversification, and long-term culture beyond 300 days. These seem to me as a major strength of the paper and an important contribution to the field. The authors present solid evidence about the high cell type diversity present in their cultures. It is a major point and therefore it could be better compared to the state of the art. I commend the authors for using three different IPS lines, this is a very important part of their proof. The staining and imaging quality of the manuscript is of excellent quality.

      We thank the reviewer for the positive comments on the potential of our novel platform to address key problems of in vitro neural culture, highlighting the longevity and reproducibility of the method across multiple cell lines.

      Weaknesses: 

      (1) The title is misleading: The presented cultures appear not to be organoids, but 2D neural cultures, with an insufficiently described intermediate EB stage. For nomenclature, see: doi: 10.1038/s41586-022-05219-6. Should the tissue develop considerable 3D depth, it would suffer from the same limited nutrient supply as 3D models - as the authors point out in their introduction. 

      We appreciate the opportunity to clarify this point. We respectfully disagree that the cultures do not meet the consensus definition of an organoid. In fact, a direct quote from the seminal nomenclature paper referenced by the reviewer states: “We define organoids as in vitro-generated cellular systems that emerge by self-organization, include multiple cell types, and exhibit some cytoarchitectural and functional features reminiscent of an organ or organ region. Organoids can be generated as 3D cultures or by a combination of 3D and 2D approaches (also known as 2.5D) that can develop and mature over long periods of time (months to years).” (Pasca et al, 2022 doi10.1038/s41586-022-05219-6). Therefore, while many organoid types indeed have a more spherical or globular 3D shape, the term organoid also applies to semi-3D or non-globular adherent organoids, such as renal (Czerniecki et al 2018, doi.org/10.1016/j.stem.2018.04.022) and gastrointestinal organoids (Kakni et al 2022, doi.org/10.1016/j.tibtech.2022.01.006). Accordingly, the adherent cortical organoids described in the manuscript exhibit self-organization to single radial structures consisting of multiple cell layers in the z-axis, reaching ~200um thickness (therefore remaining within the limits for sufficient nutrient supply), with consistent cytoarchitectural topology and electrophysiological activity, and therefore meet the consensus definition of an organoid.

      (2) The method therefore should be compared to state-of-the-art (well-based or not) 2D cultures, which seems to be somewhat overlooked in the paper, therefore making it hard to assess what the advance is that is presented by this work. 

      It was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods. Compared to state-of-the-art 2D neural network cultures, adherent cortical organoids provide distinct advantages in:

      (1) Higher order self-organized structure formation, including segregation of deeper and upper cortical layers.

      (2) Longevity: adherent cortical organoids can be successfully kept in culture up to 1 year where 2D cultures typically deteriorate after 8-12 weeks.

      (3) Maturity, including the formation of dendritic mushroom spines and robust electrophysiological activity.

      (4) Cell type diversity including a more physiological ratio of inhibitory and excitatory neurons (10% GAD67+/NeuN+ neurons in adherent cortical organoids, vs 1% in 2D neural networks) and the emergence of oligodendrocyte lineage cells.

      On the other hand, limitations of adherent cortical organoids compared to 2D neural network cultures are:

      (1) Culture times for organoids are much longer than for 2D cultures and the method can therefore be more laborious and more expensive.

      (2) Whole cell patch clamping is not easily feasible in the organoids because of the restricting dimensions of the 384well plates.

      (3) Reproducibility is prominently claimed throughout the manuscript. However, it is challenging to assess this claim based on the data presented, which mostly contain single frames of unquantified, high-resolution images. There are almost no systematic quantifications presented. The ones present (Figure S1D, Figure 4) show very large variability. However, the authors show sets of images across wells (Figure S1B, Figure S3) which hint that in some important aspects, the culture seems reproducible and robust. 

      We made considerable efforts to establish quantitative metrics to assess reproducibility. We applied a quantitative scoring system of single radial structures at different time points for multiple batches of all three lines as indicated in Figure S1D. This figure represents a comprehensive dataset in which each dot represents the average of a different batch of organoids containing 10-40 organoids per batch. To emphasize this, we will adapt the graph to better reflect the breadth of the dataset. Additional quantifications are given in Figure S2 for progenitor and layer markers for Line 1 and in Figure S5 for interneurons across all three lines, showing relatively low variability. That being said, we acknowledge the reviewer’s concerns and will modify the text to reduce the emphasis of this point, pending more extensive data addressing reproducibility across a wide range of parameters.

      (4) What is in the middle? All images show markers in cells present around the center. The center however seems to be a dense lump of cells based on DAPI staining. What is the identity of these cells? Do these cells persist throughout the protocol? Do they divide? Until when? Addressing this prominent cell population is currently lacking. 

      A more comprehensive characterization of the cells in the center remains a significant challenge due to the high cell density hindering antibody penetration. However, dye-based staining methods such as DAPI and the LIVE/DEAD panel confirm a predominance of intact nuclei with very minimal cell death. The limited available data suggest that a substantial proportion of the cells in the center are proliferative neural progenitors, indicated by immunolabeling for SOX2 and Ki67. We will add additional figures to support these findings. Furthermore, we are currently optimizing the conditions to perform single cell / nuclear RNA sequencing to further characterize the cellular composition of the organoids.

      (5) This manuscript proposes a new method of 2D neural culture. However, the description and representation of the method are currently insufficient. <br /> (a) The results section would benefit from a clear and concise, but step-by-step overview of the protocol. The current description refers to an earlier paper and appears to skip over some key steps. This section would benefit from being completely rewritten. This is not a replacement for a clear methods section, but a section that allows readers to clearly interpret results presented later.

      We will revise the manuscript to include a more detailed step-by-step overview of the protocol.

      (b) Along the same lines, the graphical abstract should be much more detailed. It should contain the time frames and the media used at the different stages of the protocol, seeding numbers, etc. 

      As suggested, we will also adapt the graphical abstract to include more detail.

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, van der Kroeg et al have developed a method for creating 3D cortical organoids using iPSC-derived neural progenitor cells in 384-well plates, thus scaling down the neural organoids to adherent culture and a smaller format that is amenable to high throughput cultivation. These adherent cortical organoids, measuring 3 x 3 x 0.2 mm, self-organize over eight weeks and include multiple neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.

      Strengths: 

      (1) The organoids can be cultured for up to 10 months, exhibiting mature dendritic spines, axonal myelination, and robust neuronal activity. 

      (2) Unlike free-floating organoids, these do not develop necrotic cores, making them ideal for high-throughput drug discovery, neurotoxicological screening, and brain disorder studies.

      (3) The method addresses the technical challenge of achieving higher-order neural complexity with reduced heterogeneity and the issue of necrosis in larger organoids. The method presents a technical advance in organoid culture.

      (4) The method has been demonstrated with multiple cell lines which is a strength. 

      (5) The manuscript provides high-quality immunostaining for multiple markers. 

      We appreciate the reviewer’s acknowledgement of the strengths of this novel platform as a technical advance in organoid culture that reduces heterogeneity and shows potential for higher throughput experiments.

      Weaknesses: 

      (1) Direct head-to-head comparison with standard organoid culture seems to be missing and may be valuable for benchmarking, ie what can be done with the new method that cannot be done with standard culture and vice versa, ie what are the aspects in which new method could be inferior to the standard.

      In our opinion, it would be extremely difficult to directly compare methods because of substantial differences. Most notably, whole brain organoids grow to large and irregular globular shapes, while adherent cortical organoids have a highly standardized shape confined by the limits of a 384-well. Moreover, it was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods.

      (2) It would be important to further benchmark the throughput, ie what is the success rate in filling and successfully growing the organoids in the entire 384 well plate? 

      Figure S1D shows the success rate of organoid formation and stability of the organoid structures over time. In addition, we will add the number of wells that were filled per plate.

      (3) For each NPC line an optimal seeding density was estimated based on the proliferation rate of that NPC line and via visual observation after 6 weeks of culture. It would be important to delineate this protocol in more robust terms, in order to enable reproducibility with different cell lines and amongst the labs. 

      Figure S1C provides the relationship between proliferation rate and seeding density, allowing estimation of seeding densities based on the proliferation rate of the NPCs. However, we appreciate the reviewers feedback and will modify the methods to provide more detail.

      Reviewer #3 (Public Review): 

      Summary: 

      Kroeg et al. have introduced a novel method to produce 3D cortical layer formation in hiPSC-derived models, revealing a remarkably consistent topography within compact dimensions. This technique involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells in 384-well plates, triggering the spontaneous assembly of adherent cortical organoids consisting of various neuronal subtypes, astrocytes, and oligodendrocyte lineage cells. 

      Strengths: 

      Compared to existing brain organoid models, these adherent cortical organoids demonstrate enhanced reproducibility and cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and the investigation of brain disorder pathophysiology. This is an important and timely issue that needs to be addressed to improve the current brain organoid systems. 

      We thank the reviewer for highlighting the strengths of our novel platform. We appreciate that all three reviewers agree that the adherent cortical organoids presented in this manuscript reliably demonstrate increased reproducibility and longevity. They also commend its potential for higher throughput drug discovery and neurotoxicological/phenotype screening purposes.

      Weaknesses: 

      While the authors have provided significant data supporting this claim, several aspects necessitate further characterization and clarification. Mainly, highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial elements to emphasize the future broad potential of this new organoid system for large-scale pharmacological screening.

      We appreciate the feedback and will add more detail on consistency and standardization of functional outputs.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      To the Senior Editor and the Reviewing Editor:

      We sincerely appreciate the valuable comments provided by the reviewers, the reviewing editor, and the senior editor. Based on our last response and revision, we are confused by the two limitations noted in the eLife assessment. 

      (1) benchmarking against comparable methods is limited.

      In our last revision, we added the comparison experiments with TNDM, as the reviewers requested. Additionally, it is crucial to emphasize that our evaluation of decoding capabilities of behaviorally relevant signals has been benchmarked against the performance of the ANN on raw signals, which, as Reviewer #1 previously noted, nearly represents the upper limit of performance. Consequently, we believe that our benchmarking methods are sufficiently strong.

      (2) some observations may be a byproduct of their method, and may not constitute new scientific observations.

      We believe that our experimental results are sufficient to demonstrate that our conclusions are not byproducts of d-VAE based on three reasons:

      (1) The d-VAE, as a latent variable model, adheres to the population doctrine, which posits that latent variables are responsible for generating the activities of individual neurons. The goal of such models is to maximize the explanation of the raw signals. At the signal level, the only criterion we can rely on is neural reconstruction performance, in which we have achieved unparalleled results. Thus, it is inappropriate to focus on the mixing process during the model's inference stage while overlooking the crucial de-mixing process during the generation stage and dismissing the significance of our neural reconstruction results. For more details, please refer to the first point in our response to Q4 from Reviewer #4.

      (2) The criterion that irrelevant signals should contain minimal information can effectively demonstrate that our conclusions are not by-products of d-VAE. Unfortunately, the reviewers seem to have overlooked this criterion. For more details, please refer to the third point in our response to Q4 from Reviewer #4

      (3) Our synthetic experimental results also substantiate that our conclusions are not byproducts of d-VAE. However, it appears the reviewers did not give these results adequate consideration. For more details, please refer to the fourth point in our response to Q4 from Reviewer #4.

      Furthermore, our work presents not just "a useful method" but a comprehensive framework. Our study proposes, for the first time, a framework for defining, extracting, and validating behaviorally relevant signals. In our current revision, to clearly distinguish between d-VAE and other methods, we have formalized the extraction of behaviorally relevant signals into a mathematical optimization problem. To our knowledge, current methods have not explicitly proposed extracting behaviorally relevant signals, nor have they identified and addressed the key challenges of extracting relevant signals. Similarly, existing research has not yet defined and validated behaviorally relevant signals. For more details, please refer to our response to Q1 from Reviewer #4.

      Based on these considerations, we respectfully request that you reconsider the eLife assessment of our work. We greatly appreciate your time and attention to this matter.

      The main revisions made to the manuscript are as follows:

      (1) We have formalized the extraction of behaviorally relevant signals into a mathematical optimization problem, enabling a clearer distinction between d-VAE and other models.

      (2) We have moderated the assertion about linear readout to highlight its conjectural nature and have broadened the discussion regarding this conclusion. 

      (3) We have elaborated on the model details of d-VAE and have removed the identifiability claim.

      To Reviewer #1

      Q1: “As reviewer 3 also points out, I would, however, caution to interpret this as evidence for linear read-out of the motor system - your model performs a non-linear transformation, and while this is indeed linearly decodable, the motor system would need to do something similar first to achieve the same. In fact to me it seems to show the opposite, that behaviour-related information may not be generally accessible to linear decoders (including to down-stream brain areas).”

      Thank you for your comments. It's important to note that the conclusions we draw are speculative and not definitive. We use terms like "suggest" to reflect this uncertainty. To further emphasize the conjectural nature of our conclusions, we have deliberately moderated our tone.

      The question of whether behaviorally-relevant signals can be accessed by linear decoders or downstream brain regions hinges on the debate over whether the brain employs a strategy of filtering before decoding. If the brain employs such a strategy, the brain can probably access these signals. In our opinion, it is likely that the brain utilizes this strategy.

      Given the existence of behaviorally relevant signals, it is reasonable to assume that the brain has intrinsic mechanisms to differentiate between relevant and irrelevant signals. There is growing evidence suggesting that the brain utilizes various mechanisms, such as attention and specialized filtering, to suppress irrelevant signals and enhance relevant signals [1-3]. Therefore, it is plausible that the brain filters before decoding, thereby effectively accessing behaviorally relevant signals.

      Thank you for your valuable feedback.

      (1) Sreenivasan, Sameet, and Ila Fiete. "Grid cells generate an analog error-correcting code for singularly precise neural computation." Nature neuroscience 14.10 (2011): 1330-1337.

      (2) Schneider, David M., Janani Sundararajan, and Richard Mooney. "A cortical filter that learns to suppress the acoustic consequences of movement." Nature 561.7723 (2018): 391-395.

      (3) Nakajima, Miho, L. Ian Schmitt, and Michael M. Halassa. "Prefrontal cortex regulates sensory filtering through a basal ganglia-to-thalamus pathway." Neuron 103.3 (2019): 445-458.

      Q2: “As in my initial review, I would also caution against making strong claims about identifiability although this work and TNDM seem to show that in practise such methods work quite well. CEBRA, in contrast, offers some theoretical guarantees, but it is not a generative model, so would not allow the type of analysis done in this paper. In your model there is a para,eter \alpha to balance between neural and behaviour reconstruction. This seems very similar to TNDM and has to be optimised - if this is correct, then there is manual intervention required to identify a good model.”

      Thank you for your comments. 

      Considering your concerns about our identifiability claims and the fact that identifiability is not directly relevant to the core of our paper, we have removed content related to identifiability.

      Firstly, our model is based on the pi-VAE, which also has theoretical guarantees. However, it is important to note that all such theoretical guarantees (including pi-VAE and CEBRA) are based on certain assumptions that cannot be validated as the true distribution of latent variables remains unknown.

      Secondly, it is important to clarify that the identifiability of latent variables does not impact the conclusions of this paper, nor does this paper make specific conclusions about the model's latent variables. Identifiability means that distinct latent variables correspond to distinct observations. If multiple latent variables can generate the same observation, it becomes impossible to determine which one is correct given the observation, which leads to the issue of nonidentifiability. Notably, our analysis focuses on the generated signals, not the latent variables themselves, and thus the identifiability of these variables does not affect our findings. 

      Our approach, dedicated to extracting these signals, distinctly differs from methods such as TNDM, which focuses on extracting behaviorally relevant latent dynamics. To clearly set apart d-VAE from other models, we have framed the extraction of behaviorally relevant signals as the following mathematical optimization problem:

      where 𝑥# denotes generated behaviorally-relevant signals, 𝑥 denotes raw noisy signals, 𝐸(⋅,⋅) demotes reconstruction loss, and 𝑅(⋅) denotes regularization loss. It is important to note that while both d-VAE and TNDM employ reconstruction loss, relying solely on this term is insufficient for determining the optimal degree of similarity between the generated and raw noisy signals. The key to accurately extracting behaviorally relevant signals lies in leveraging prior knowledge about these signals to determine the optimal similarity degree, encapsulated by 𝑅(𝒙𝒓).  Other studies have not explicitly proposed extracting behaviorally-relevant signals, nor have they identified and addressed the key challenges involved in extracting relevant signals. Consequently, our approach is distinct from other methods.

      Thank you for your valuable feedback.

      Q3: “Somewhat related, I also found that the now comprehensive comparison with related models shows that the using decoding performance (R2) as a metric for model comparison may be problematic: the R2 values reported in Figure 2 (e.g. the MC_RTT dataset) should be compared to the values reported in the neural latent benchmark, which represent well-tuned models (e.g. AutoLFADS). The numbers (difficult to see, a table with numbers in the appendix would be useful, see: https://eval.ai/web/challenges/challenge-page/1256/leaderboard) seem lower than what can be obtained with models without latent space disentanglement. While this does not necessarily invalidate the conclusions drawn here, it shows that decoding performance can depend on a variety of model choices, and may not be ideal to discriminate between models. I'm also surprised by the low neural R2 for LFADS I assume this is condition-averaged) - LFADS tends to perform very well on this metric.”

      Thank you for your comments. The dataset we utilized is not from the same day as the neural latent benchmark dataset. Notably, there is considerable variation in the length of trials within the RTT paradigm, and the dataset lacks explicit trial information, rendering trial-averaging unsuitable. Furthermore, behaviorally relevant signals are not static averages devoid of variability; even behavioral data exhibits variability. We computed the neural R2 using individual trials rather than condition-averaged responses. 

      Thank you for your valuable feedback.

      Q4: “One statement I still cannot follow is how the prior of the variational distribution is modelled. You say you depart from the usual Gaussian prior, but equation 7 seems to suggest there is a normal prior. Are the parameters of this distribution learned? As I pointed out earlier, I however suspect this may not matter much as you give the prior a very low weight. I also still am not sure how you generate a sample from the variational distribution, do you just draw one for each pass?”

      Thank you for your questions.

      The conditional distribution of prior latent variables 𝑝%(𝒛|𝒚) is a Gaussian distribution, but the distribution of prior latent variables 𝑝(𝒛) is a mixture Gaussian distribution. The distribution of prior latent variables 𝑝(𝒛) is:

      where denotes the empirical distribution of behavioral variables

      𝒚, and 𝑁 denotes the number of samples, 𝒚(𝒊) denotes the 𝒊th sample, δ(⋅) denotes the Dirac delta function, and 𝑝%(𝒛|𝒚) denotes the conditional distribution of prior latent variables given the behavioral variables parameterized by network 𝑚. Based on the above equation, we can see that 𝑝(𝒛) is not a Gaussian distribution, it is a Gaussian mixture model with 𝑁 components, which is theoretically a universal approximator of continuous probability densities.

      Learning this prior is important, as illustrated by our latent variable visualizations, which are not a Gaussian distribution. Upon conducting hypothesis testing for both latent variables and behavioral variables, neither conforms to Gaussian distribution (Lilliefors test and Kolmogorov-Smirnov test). Consequently, imposing a constraint on the latent variables towards N(0,1) is expected to affect performance adversely.

      Regarding sampling, during training process, we draw only one sample from the approximate posterior distribution . It is worth noting that drawing multiple samples or one sample for each pass does not affect the experimental results. After training, we can generate a sample from the prior by providing input behavioral data 𝒚(𝒊) and then generating corresponding samples via and . To extract behaviorally-relevant signals from raw signals, we use and .

      Thank you for your valuable feedback.

      Q5: “(1) I found the figures good and useful, but the text is, in places, not easy to follow. I think the manuscript could be shortened somewhat, and in some places more concise focussed explanations would improve readability.

      (2) I would not call the encoding "complex non-linear" - non-linear is a clear term, but complex can mean many things (e.g. is a quadratic function complex?) ”

      Thank you for your recommendation. We have revised the manuscript for enhanced clarity.  We call the encoding “complex nonlinear” because neurons encode information with varying degrees of nonlinearity, as illustrated in Fig. 3b, f, and Fig. S3b.

      Thank you for your valuable feedback.

      To Reviewer #2

      Q1: “I still remain unconvinced that the core findings of the paper are "unexpected". In the response to my previous Specific Comment #1, they say "We use the term 'unexpected' due to the disparity between our findings and the prior understanding concerning neural encoding and decoding." However, they provide no citations or grounding for why they make those claims. What prior understanding makes it unexpected that encoding is more complex than decoding given the entropy, sparseness, and high dimensionality of neural signals (the "encoding") compared to the smoothness and low dimensionality of typical behavioural signals (the "decoding")?” 

      Thank you for your comments. We believe that both the complexity of neural encoding and the simplicity of neural decoding in motor cortex are unexpected.

      The Complexity of Neural Encoding: As noted in the Introduction, neurons with small R2 values were traditionally considered noise and consequently disregarded, as detailed in references [1-3]. However, after filtering out irrelevant signals, we discovered that these neurons actually contain substantial amounts of behavioral information, previously unrecognized. Similarly, in population-level analyses, neural signals composed of small principal components (PCs) are often dismissed as noise, with analyses typically utilizing only between 6 and 18 PCs [4-10]. Yet, the discarded PC signals nonlinearly encode significant amounts of information, with practically useful dimensions found to range between 30 and 40—far exceeding the usual number analyzed. These findings underscore the complexity of neural encoding and are unexpected.

      The Simplicity of Neural Decoding: In the motor cortex, nonlinear decoding of raw signals has been shown to significantly outperform linear decoding, as evidenced in references [11,12]. Interestingly, after separating behaviorally relevant and irrelevant signals, we observed that the linear decoding performance of behaviorally relevant signals is nearly equivalent to that of nonlinear decoding—a phenomenon previously undocumented in the motor cortex. This discovery is also unexpected.

      Thank you for your valuable feedback.

      (1) Georgopoulos, Apostolos P., Andrew B. Schwartz, and Ronald E. Kettner. "Neuronal population coding of movement direction." Science 233.4771 (1986): 1416-1419.

      (2) Hochberg, Leigh R., et al. "Reach and grasp by people with tetraplegia using a neurally controlled robotic arm." Nature 485.7398 (2012): 372-375. 

      (3) Inoue, Yoh, et al. "Decoding arm speed during reaching." Nature communications 9.1 (2018): 5243.

      (4) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.

      (5) Kaufman, Matthew T., et al. "Cortical activity in the null space: permitting preparation without movement." Nature neuroscience 17.3 (2014): 440-448.

      (6) Elsayed, Gamaleldin F., et al. "Reorganization between preparatory and movement population responses in motor cortex." Nature communications 7.1 (2016): 13239.

      (7) Sadtler, Patrick T., et al. "Neural constraints on learning." Nature 512.7515 (2014): 423426.

      (8) Golub, Matthew D., et al. "Learning by neural reassociation." Nature neuroscience 21.4 (2018): 607-616.

      (9) Gallego, Juan A., et al. "Cortical population activity within a preserved neural manifold underlies multiple motor behaviors." Nature communications 9.1 (2018): 4233.

      (10) Gallego, Juan A., et al. "Long-term stability of cortical population dynamics underlying consistent behavior." Nature neuroscience 23.2 (2020): 260-270.

      (11) Glaser, Joshua I., et al. "Machine learning for neural decoding." Eneuro 7.4 (2020).

      (12) Willsey, Matthew S., et al. "Real-time brain-machine interface in non-human primates achieves high-velocity prosthetic finger movements using a shallow feedforward neural network decoder." Nature Communications 13.1 (2022): 6899.

      Q2: “I still take issue with the premise that signals in the brain are "irrelevant" simply because they do not correlate with a fixed temporal lag with a particular behavioural feature handchosen by the experimenter. In the response to my previous review, the authors say "we employ terms like 'behaviorally-relevant' and 'behaviorally-irrelevant' only regarding behavioral variables of interest measured within a given task, such as arm kinematics during a motor control task.". This is just a restatement of their definition, not a response to my concern, and does not address my concern that the method requires a fixed temporal lag and continual decoding/encoding. My example of reward signals remains. There is a huge body of literature dating back to the 70s on the linear relationships between neural and activity and arm kinematics; in a sense, the authors have chosen the "variable of interest" that proves their point. This all ties back to the previous comment: this is mostly expected, not unexpected, when relating apparently-stochastic, discrete action potential events to smoothly varying limb kinematics.”

      Thank you for your comments. 

      Regarding the experimenter's specification of behavioral variables of interest, we followed common practice in existing studies [1, 2]. Regarding the use of fixed temporal lags, we followed the same practice as papers related to the dataset we use, which assume fixed temporal lags [3-5]. Furthermore, many studies in the motor cortex similarly use fixed temporal lags [68].

      Concerning the issue of rewards, in the paper you mentioned [9], the impact of rewards occurs after the reaching phase. It's important to note that in our experiments, we analyze only the reaching phase, without any post-movement phase. 

      If the impact of rewards can be stably reflected in the signals in the reaching phase of the subsequent trial, and if the reward-induced signals do not interfere with decoding—since these signals are harmless for decoding and beneficial for reconstruction—our model is likely to capture these signals. If the signals induced by rewards during the reaching phase are randomly unstable, our model will likely be unable to capture them.

      If the goal is to extract post-movement neural activity from both rewarded and unrewarded trials, and if the neural patterns differ between these conditions, one could replace the d-VAE's regression loss, used for continuous kinematics decoding, with a classification loss tailored to distinguish between rewarded and unrewarded conditions.

      To clarify the definition, we have revised it in the manuscript. Specifically, before a specific definition, we briefly introduce the relevant signals and irrelevant signals. Behaviorally irrelevant signals refer to those not directly associated with the behavioral variables of interest and may include noise or signals from variables of no interest. In contrast, behaviorally relevant signals refer to those directly related to the behavioral variables of interest. For instance, rewards in the post-movement phase are not directly related to behavioral variables (kinematics) in the reaching movement phase.

      It is important to note that our definition of behaviorally relevant signals not only includes decoding capabilities but also specific requirement at the signal level, based on two key requirements:

      (1) they should closely resemble raw signals to preserve the underlying neuronal properties without becoming so similar that they include irrelevant signals. (encoding requirement), and  (2) they should contain behavioral information as much as possible (decoding requirement). Signals that meet both requirements are considered effective behaviorally relevant signals. In our study, we assume raw signals are additively composed of behaviorally-relevant and irrelevant signals. We define irrelevant signals as those remaining after subtracting relevant signals from raw signals. Therefore, we believe our definition is clearly articulated. 

      Thank you for your valuable feedback.

      (1) Sani, Omid G., et al. "Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification." Nature Neuroscience 24.1 (2021): 140-149.

      (2) Buetfering, Christina, et al. "Behaviorally relevant decision coding in primary somatosensory cortex neurons." Nature neuroscience 25.9 (2022): 1225-1236.

      (3) Wang, Fang, et al. "Quantized attention-gated kernel reinforcement learning for brain– machine interface decoding." IEEE transactions on neural networks and learning systems 28.4 (2015): 873-886.

      (4) Dyer, Eva L., et al. "A cryptography-based approach for movement decoding." Nature biomedical engineering 1.12 (2017): 967-976.

      (5) Ahmadi, Nur, Timothy G. Constandinou, and Christos-Savvas Bouganis. "Robust and accurate decoding of hand kinematics from entire spiking activity using deep learning." Journal of Neural Engineering 18.2 (2021): 026011.

      (6) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.

      (7) Kaufman, Matthew T., et al. "Cortical activity in the null space: permitting preparation without movement." Nature neuroscience 17.3 (2014): 440-448.

      (8) Elsayed, Gamaleldin F., et al. "Reorganization between preparatory and movement population responses in motor cortex." Nature communications 7.1 (2016): 13239.

      (9) Ramkumar, Pavan, et al. "Premotor and motor cortices encode reward." PloS one 11.8 (2016): e0160851.

      Q3: “The authors seem to have missed the spirit of my critique: to say "linear readout is performed in motor cortex" is an over-interpretation of what their model can show.”

      Thank you for your comments. It's important to note that the conclusions we draw are speculative and not definitive. We use terms like "suggest" to reflect this uncertainty. To further emphasize the conjectural nature of our conclusions, we have deliberately moderated our tone.

      The question of whether behaviorally-relevant signals can be accessed by downstream brain regions hinges on the debate over whether the brain employs a strategy of filtering before decoding. If the brain employs such a strategy, the brain can probably access these signals. In our view, it is likely that the brain utilizes this strategy.

      Given the existence of behaviorally relevant signals, it is reasonable to assume that the brain has intrinsic mechanisms to differentiate between relevant and irrelevant signals. There is growing evidence suggesting that the brain utilizes various mechanisms, such as attention and specialized filtering, to suppress irrelevant signals and enhance relevant signals [1-3]. Therefore, it is plausible that the brain filters before decoding, thereby effectively accessing behaviorally relevant signals.

      Regarding the question of whether the brain employs linear readout, given the limitations of current observational methods and our incomplete understanding of brain mechanisms, it is challenging to ascertain whether the brain employs a linear readout. In many cortical areas, linear decoders have proven to be sufficiently accurate. Consequently, numerous studies [4, 5, 6], including the one you referenced [4], directly employ linear decoders to extract information and formulate conclusions based on the decoding results. Contrary to these approaches, our research has compared the performance of linear and nonlinear decoders on behaviorally relevant signals and found their decoding performance is comparable. Considering both the decoding accuracy and model complexity, our results suggest that the motor cortex may utilize linear readout to decode information from relevant signals. Given the current technological limitations, we consider it reasonable to analyze collected data to speculate on the potential workings of the brain, an approach that many studies have also embraced [7-10]. For instance, a study [7] deduces strategies the brain might employ to overcome noise by analyzing the structure of recorded data and decoding outcomes for new stimuli.

      Thank you for your valuable feedback.

      (1) Sreenivasan, Sameet, and Ila Fiete. "Grid cells generate an analog error-correcting code for singularly precise neural computation." Nature neuroscience 14.10 (2011): 1330-1337.

      (2) Schneider, David M., Janani Sundararajan, and Richard Mooney. "A cortical filter that learns to suppress the acoustic consequences of movement." Nature 561.7723 (2018): 391-395.

      (3) Nakajima, Miho, L. Ian Schmitt, and Michael M. Halassa. "Prefrontal cortex regulates sensory filtering through a basal ganglia-to-thalamus pathway." Neuron 103.3 (2019): 445-458.

      (4) Jurewicz, Katarzyna, et al. "Irrational choices via a curvilinear representational geometry for value." bioRxiv (2022): 2022-03.

      (5) Hong, Ha, et al. "Explicit information for category-orthogonal object properties increases along the ventral stream." Nature neuroscience 19.4 (2016): 613-622.

      (6) Chang, Le, and Doris Y. Tsao. "The code for facial identity in the primate brain." Cell 169.6 (2017): 1013-1028.

      (7) Ganmor, Elad, Ronen Segev, and Elad Schneidman. "A thesaurus for a neural population code." Elife 4 (2015): e06134.

      (8) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.

      (9) Gallego, Juan A., et al. "Cortical population activity within a preserved neural manifold underlies multiple motor behaviors." Nature communications 9.1 (2018): 4233.

      (10) Gallego, Juan A., et al. "Long-term stability of cortical population dynamics underlying consistent behavior." Nature neuroscience 23.2 (2020): 260-270.

      Q4: “Agreeing with my critique is not sufficient; please provide the data or simulations that provides the context for the reference in the fano factor. I believe my critique is still valid.”

      Thank you for your comments. As we previously replied, Churchland's research examines the variability of neural signals across different stages, including the preparation and execution phases, as well as before and after the target appears. Our study, however, focuses exclusively on the movement execution phase. Consequently, we are unable to produce comparative displays similar to those in his research. Intuitively, one might expect that the variability of behaviorally relevant signals would be lower; however, since no prior studies have accurately extracted such signals, the specific FF values of behaviorally relevant signals remain unknown. Therefore, presenting these values is meaningful, and can provide a reference for future research. While we cannot compare FF across different stages, we can numerically compare the values to the Poisson count process. An FF of 1 indicates a Poisson firing process, and our experimental data reveals that most neurons have an FF less than 1, indicating that the variance in firing counts is below the mean.  Thank you for your valuable feedback.

      To Reviewer #4

      Q1: “Overall, studying neural computations that are behaviorally relevant or not is an important problem, which several previous studies have explored (for example PSID in (Sani et al. 2021), TNDM in (Hurwitz et al. 2021), TAME-GP in (Balzani et al. 2023), pi-VAE in (Zhou and Wei 2020), and dPCA in (Kobak et al. 2016), etc). However, this manuscript does not properly put their work in the context of such prior works. For example, the abstract states "One solution is to accurately separate behaviorally-relevant and irrelevant signals, but this approach remains elusive", which is not the case given that these prior works have done that. The same is true for various claims in the main text, for example "Furthermore, we found that the dimensionality of primary subspace of raw signals (26, 64, and 45 for datasets A, B, and C) is significantly higher than that of behaviorally-relevant signals (7, 13, and 9), indicating that using raw signals to estimate the neural dimensionality of behaviors leads to an overestimation" (line 321). This finding was presented in (Sani et al. 2021) and (Hurwitz et al. 2021), which is not clarified here. This issue of putting the work in context has been brought up by other reviewers previously but seems to remain largely unaddressed. The introduction is inaccurate also in that it mixes up methods that were designed for separation of behaviorally relevant information with those that are unsupervised and do not aim to do so (e.g., LFADS). The introduction should be significantly revised to explicitly discuss prior models/works that specifically formulated this behavior separation and what these prior studies found, and how this study differs.”  

      Thank you for your comments. Our statement about “One solution is to accurately separate behaviorally-relevant and irrelevant signals, but this approach remains elusive” is accurate. To our best knowledge, there is no prior works to do this work--- separating accurate behaviorally relevant neural signals at both single-neuron and single-trial resolution. The works you mentioned have not explicitly proposed extracting behaviorally relevant signals, nor have they identified and addressed the key challenges of extracting relevant signals, namely determining the optimal degree of similarity between the generated relevant signals and raw signals. Those works focus on the latent neural dynamics, rather than signal level.

      To clearly set apart d-VAE from other models, we have framed the extraction of behaviorally relevant signals as the following mathematical optimization problem:

      where 𝒙𝒓 denotes generated behaviorally-relevant signals, 𝒙 denotes raw noisy signals, 𝐸(⋅,⋅) demotes reconstruction loss, and 𝑅(⋅) denotes regularization loss. It is important to note that while both d-VAE and TNDM employ reconstruction loss, relying solely on this term is insufficient for determining the optimal degree of similarity between the generated and raw noisy signals. The key to accurately extracting behaviorally relevant signals lies in leveraging prior knowledge about these signals to determine the optimal similarity degree, encapsulated by 𝑅(𝒙𝒓). All the works you mentioned did not have the key part 𝑅(𝒙𝒓).

      Regarding the dimensionality estimation, the dimensionality of neural manifolds quantifies the degrees of freedom required to describe population activity without significant information loss.

      There are two differences between our work and PSID and TNDM. 

      First, the dimensions they refer to are fundamentally different from ours. The dimensionality we describe pertains to a linear subspace, where a neural dimension or neural mode or principal component basis, , with N representing the number of neurons. However, the vector length of a neural mode of PSID and our approach differs; PSID requires concatenating multiple time steps T, essentially making , TNDM, on the other hand, involves nonlinear dimensionality reduction, which is different from linear dimensionality reduction.

      Second, we estimate neural dimensionality by explaining the variance of neural signals, whereas PSID and TNDM determine dimensionality through decoding performance saturation. It is important to note that the dimensionality at which decoding performance saturates may not accurately reflect the true dimensionality of neural manifolds, as some dimensions may contain redundant information that does not enhance decoding performance.

      We acknowledge that while LFADS can generate signals that contain some behavioral information, it was not specifically designed to do so. Following your suggestion, we have removed this reference from the Introduction.

      Thank you for your valuable feedback.

      Q2: “Claims about linearity of "motor cortex" readout are not supported by results yet stated even in the abstract. Instead, what the results support is that for decoding behavior from the output of the dVAE model -- that is trained specifically to have a linear behavior readout from its embedding -- a nonlinear readout does not help. This result can be biased by the very construction of the dVAE's loss that encourages a linear readout/decoding from embeddings, and thus does not imply a finding about motor cortex.”

      Thank you for your comments. We respectfully disagree with the notion that the ability of relevant signals to be linearly decoded is due to constraints that allow embedding to be linearly decoded. Embedding involves reorganizing or transforming the structure of original signals, and they can be linearly decoded does not mean the corresponding signals can be decoded linearly.

      Let's clarify this with three intuitive examples:

      Example 1: Image denoising is a well-established field. Whether employing supervised or blind denoising methods [1, 2], both can effectively recover the original image. This denoising process closely resembles the extraction of behaviorally relevant signals from raw signals. Consider if noisy images are not amenable to linear decoding (classification); would removing the noise enable linear decoding? The answer is no. Typically, the noise in images captured under normal conditions is minimal, yet even the clear images remain challenging to decode linearly.

      Example 2: Consider the task of face recognition, where face images are set against various backgrounds, in this context, the pixels representing the face corresponds to relevant signals, while the background pixels are considered irrelevant. Suppose a network is capable of extracting the face pixels and the resulting embedding can be linearly decoded. Can the face pixels themselves be linearly decoded? The answer is no. If linear decoding of face pixels were feasible, the challenging task of face recognition could be easily resolved by merely extracting the face from the background and training a linear classifier.

      Example 3: In the MNIST dataset, the background is uniformly black, and its impact is minimal. However, linear SVM classifiers used directly on the original pixels significantly underperform compared to non-linear SVMs.

      In summary, embedding involves reorganizing the structure of the original signals through a feature transformation function. However, the reconstruction process can recover the structure of the original signals from the embedding. The fact that the structure of the embedding can be linearly decoded does not imply that the structure of the original signals can be linearly decoded in the same way. It is inappropriate to focus on the compression process without equally considering the reconstruction process.

      Thank you for your valuable feedback.

      (1) Mao, Xiao-Jiao, Chunhua Shen, and Yu-Bin Yang. "Image restoration using convolutional auto-encoders with symmetric skip connections." arXiv preprint arXiv:1606.08921 (2016).

      (2) Lehtinen, Jaakko, et al. "Noise2Noise: Learning image restoration without clean data." International Conference on Machine Learning. International Machine Learning Society, 2018.

      Q3: “Related to the above, it is unclear what the manuscript means by readout from motor cortex. A clearer definition of "readout" (a mapping from what to what?) in general is needed. The mapping that the linearity/nonlinearity claims refer to is from the *inferred* behaviorally relevant neural signals, which themselves are inferred nonlinearly using the VAE. This should be explicitly clarified in all claims, i.e., that only the mapping from distilled signals to behavior is linear, not the whole mapping from neural data to behavior. Again, to say the readout from motor cortex is linear is not supported, including in the abstract.” 

      Thank you for your comments. We have revised the manuscript to make it more clearly. Thank you for your valuable feedback.

      Q4: “Claims about individual neurons are also confounded. The d-VAE distilling processing is a population level embedding so the individual distilled neurons are not obtainable on their own without using the population data. This population level approach also raises the possibility that information can leak from one neuron to another during distillation, which is indeed what the authors hope would recover true information about individual neurons that wasn't there in the recording (the pixel denoising example). The authors acknowledge the possibility that information could leak to a neuron that didn't truly have that information and try to rule it out to some extent with some simulations and by comparing the distilled behaviorally relevant signals to the original neural signals. But ultimately, the distilled signals are different enough from the original signals to substantially improve decoding of low information neurons, and one cannot be sure if all of the information in distilled signals from any individual neuron truly belongs to that neuron. It is still quite likely that some of the improved behavior prediction of the distilled version of low-information neurons is due to leakage of behaviorally relevant information from other neurons, not the former's inherent behavioral information. This should be explicitly acknowledged in the manuscript.”

      Thank you for your comments. We value your insights regarding the mixing process. However, we are confident in the robustness of our conclusions. We respectfully disagree with the notion that the small R2 values containing significant information are primarily due to leakage, and we base our disagreement on four key reasons.

      (1) Neural reconstruction performance is a reliable and valid criterion.

      The purpose of latent variable models is to explain neuronal activity as much as possible. Given the fact that the ground truth of behaviorally-relevant signals, the latent variables, and the generative model is unknow, it becomes evident that the only reliable reference at the signal level is the raw signals. A crucial criterion for evaluating the reliability of latent variable models (including latent variables and generated relevant signals) is their capability to effectively explain the raw signals [1]. Consequently, we firmly maintain the belief that if the generated signals closely resemble the raw signals to the greatest extent possible, in accordance with an equivalence principle, we can claim that these obtained signals faithfully retain the inherent properties of single neurons. 

      Reviewer #4 appears to focus on the compression (mixing) process without giving equal consideration to the reconstruction (de-mixing) process. Numerous studies have demonstrated that deep autoencoders can reconstruct the original signal very effectively. For example, in the field of image denoising, autoencoders are capable of accurately restoring the original image [2, 3]. If one persistently focuses on the fact of mixing and ignores the reconstruction (demix) process, even if the only criterion that we can rely on at the signal level is high, one still won't acknowledge it. If this were the case, many problems would become unsolvable. For instance, a fundamental criterion for latent variable models is their ability to explain the original data. If the ground truth of the latent variables remains unknown and the reconstruction criterion is disregarded, how can we validate the effectiveness of the model, the validity of the latent variables, or ensure that findings related to latent variables are not merely by-products of the model? Therefore, we disagree with the aforementioned notion. We believe that as long as the reconstruction performance is satisfactory, the extracted signals have successfully retained the characteristics of individual neurons.

      In our paper, we have shown in various ways that our generated signals sufficiently resemble the raw signals, including visualizing neuronal activity (Fig. 2m, Fig. 3i, and Fig. S5), achieving the highest performance among competitors (Fig. 2d, h, l), and conducting control analyses. Therefore, we believe our results are reliable. 

      (1) Cunningham, J.P. and Yu, B.M., 2014. Dimensionality reduction for large-scale neural recordings. Nature neuroscience, 17(11), pp.1500-1509.

      (2) Mao, Xiao-Jiao, Chunhua Shen, and Yu-Bin Yang. "Image restoration using convolutional auto-encoders with symmetric skip connections." arXiv preprint arXiv:1606.08921 (2016).

      (3) Lehtinen, Jaakko, et al. "Noise2Noise: Learning image restoration without clean data." International Conference on Machine Learning. International Machine Learning Society, 2018.

      (2) There is no reason for d-VAE to add signals that do not exist in the original signals.

      (1) Adding signals that does not exist in the small R2 neurons would decrease the reconstruction performance. This is because if the added signals contain significant information, they will not resemble the irrelevant signals which contain no information, and thus, the generated signals will not resemble the raw signals. The model optimizes towards reducing the reconstruction loss, and this scenario deviates from the model's optimization direction. It is worth mentioning that when the model only has reconstruction loss without the interference of decoding loss, we believe that information leakage does not happen. Because the model can only be optimized in a direction that is similar to the raw signals; adding non-existent signals to the generated signals would increase the reconstruction loss, which is contrary to the objective of optimization. 

      (2) Information carried by these additional signals is redundant for larger R2 neurons, thus they do not introduce new information that can enhance the decoding performance of the neural population, which does not benefit the decoding loss.

      Based on these two points, we believe the model would not perform such counterproductive and harmful operations.

      (3) The criterion that irrelevant signals should contain minimal information can effectively rule out the leakage scenario.

      The criterion that irrelevant signals should contain minimal information is very important, but it seems that reviewer #4 has continuously overlooked their significance. If the model's reconstruction is insufficient, or if additional information is added (which we do not believe will happen), the residuals would decode a large amount of information, and this criterion would exclude selecting such signals. To clarify, if we assume that x, y, and z denote the raw, relevant, and irrelevant signals of smaller R2 neurons, with x=y+z, and the extracted relevant signals become y+m, the irrelevant signals become z-m in this case. Consequently, the irrelevant signals contain a significant amount of information.

      We presented the decoding R2 for irrelevant signals in real datasets under three distillation scenarios: a bias towards reconstruction (alpha=0, an extreme case where the model only has reconstruction loss without decoding loss), a balanced trade-off, and a bias towards decoding (alpha=0.9), as detailed in Table 1. If significant information from small R2 neurons leaks from large R2 neurons, the irrelevant signals should contain a large amount of information. However, our results indicate that the irrelevant signals contain only minimal information, and their performance closely resembles that of the model training solely with reconstruction loss, showing no significant differences (P > 0.05, Wilcoxon rank-sum test). When the model leans towards decoding, some useful information will be left in the residuals, and irrelevant signals will contain a substantial amount of information, as observed in Table 1, alpha=0.9. Therefore, we will not choose these signals for analysis.

      In conclusion, the criterion that irrelevant signals should contain minimal information is a very effective measure to exclude undesirable signals.

      Author response table 1.

      Decoding R2 of irrelevant signals

      (4) Synthetic experiments can effectively rule out the leakage scenario.

      In the absence of ground truth data, synthetic experiments serve as an effective method for validating models and are commonly employed [1-3]. 

      Our experimental results demonstrate that d-VAE can effectively extract neural signals that more closely resemble actual behaviorally relevant signals (Fig. S2g).  If there were information leakage, it would decrease the similarity to the ground truth signals, hence we have ruled out this possibility. Moreover, in synthetic experiments with small R2 neurons (Fig. S10), results also demonstrate that our model could make these neurons more closely resemble ground truth relevant signals and recover their information. 

      In summary, synthetic experiments strongly demonstrate that our model can recover obscured neuronal information, rather than adding signals that do not exist.

      (1) Pnevmatikakis, Eftychios A., et al. "Simultaneous denoising, deconvolution, and demixing of calcium imaging data." Neuron 89.2 (2016): 285-299.

      (2) Schneider, Steffen, Jin Hwa Lee, and Mackenzie Weygandt Mathis. "Learnable latent embeddings for joint behavioural and neural analysis." Nature 617.7960 (2023): 360-368.

      (3) Zhou, Ding, and Xue-Xin Wei. "Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE." Advances in Neural Information Processing Systems 33 (2020): 7234-7247.

      Based on these four points, we are confident in the reliability of our results. If Reviewer #4 considers these points insufficient, we would highly appreciate it if specific concerns regarding any of these aspects could be detailed.

      Thank you for your valuable feedback.

      Q5: “Given the nuances involved in appropriate comparisons across methods and since two of the datasets are public, the authors should provide their complete code (not just the dVAE method code), including the code for data loading, data preprocessing, model fitting and model evaluation for all methods and public datasets. This will alleviate concerns and allow readers to confirm conclusions (e.g., figure 2) for themselves down the line.”

      Thanks for your suggestion.

      Our codes are now available on GitHub at https://github.com/eric0li/d-VAE. Thank you for your valuable feedback.

      Q6: “Related to 1) above, the authors should explore the results if the affine network h(.) (from embedding to behavior) was replaced with a nonlinear ANN. Perhaps linear decoders would no longer be as close to nonlinear decoders. Regardless, the claim of linearity should be revised as described in 1) and 2) above, and all caveats should be discussed.”

      Thank you for your suggestion. We appreciate your feasible proposal that can be empirically tested. Following your suggestion, we have replaced the decoding of the latent variable z to behavior y with a nonlinear neural network, specifically a neural network with a single hidden layer. The modified model is termed d-VAE2. We applied the d-VAE2 to the real data, and selected the optimal alpha through the validation set. As shown in Table 1, results demonstrate that the performance of KF and ANN remains comparable. Therefore, the capacity to linearly decode behaviorally relevant signals does not stem from the linear decoding of embeddings.

      Author response table 2.

      Decoding R2 of behaviorally relevant signals obtained by d-VAE2

      Additionally, it is worth noting that this approach is uncommon and is considered somewhat inappropriate according to the Information Bottleneck theory [1]. According to the Information Bottleneck theory, information is progressively compressed in multilayer neural networks, discarding what is irrelevant to the output and retaining what is relevant. This means that as the number of layers increases, the mutual information between each layer's embedding and the model input gradually decreases, while the mutual information between each layer's embedding and the model output gradually increases. For the decoding part, if the embeddings that is not closest to the output (behaviors) is used, then these embeddings might contain behaviorally irrelevant signals. Using these embeddings to generate behaviorally relevant signals could lead to the inclusion of irrelevant signals in the behaviorally relevant signals.

      To demonstrate the above statement, we conducted experiments on the synthetic data. As shown in Table 2, we present the performance (neural R2 between the generated signals and the ground truth signals) of both models at several alpha values around the optimal alpha of dVAE (alpha=0.9) selected by the validation set. The experimental results show that at the same alpha value, the performance of d-VAE2 is consistently inferior to that of d-VAE, and d-VAE2 requires a higher alpha value to achieve performance comparable to d-VAE, and the best performance of d-VAE2 is inferior to that of d-VAE.

      Author response table 3.

      Neural R2 between generated signals and real behaviorally relevant signals

      Thank you for your valuable feedback.

      (1) Shwartz-Ziv, Ravid, and Naftali Tishby. "Opening the black box of deep neural networks via information." arXiv preprint arXiv:1703.00810 (2017).

      Q7: “The beginning of the section on the "smaller R2 neurons" should clearly define what R2 is being discussed. Based on the response to previous reviewers, this R2 "signifies the proportion of neuronal activity variance explained by the linear encoding model, calculated using raw signals". This should be mentioned and made clear in the main text whenever this R2 is referred to.”

      Thank you for your suggestion. We have made the modifications in the main text. Thank you for your valuable feedback.

      Q8: “Various terms require clear definitions. The authors sometimes use vague terminology (e.g., "useless") without a clear definition. Similarly, discussions regarding dimensionality could benefit from more precise definitions. How is neural dimensionality defined? For example, how is "neural dimensionality of specific behaviors" (line 590) defined? Related to this, I agree with Reviewer 2 that a clear definition of irrelevant should be mentioned that clarifies that relevance is roughly taken as "correlated or predictive with a fixed time lag". The analyses do not explore relevance with arbitrary time lags between neural and behavior data.”

      Thanks for your suggestion. We have removed the “useless” statements and have revised the statement of “the neural dimensionality of specific behaviors” in our revised manuscripts.

      Regarding the use of fixed temporal lags, we followed the same practice as papers related to the dataset we use, which assume fixed temporal lags [1-3]. Furthermore, many studies in the motor cortex similarly use fixed temporal lags [4-6]. To clarify the definition, we have revised the definition in our manuscript. For details, please refer to the response to Q2 of reviewer #2 and our revised manuscript. We believe our definition is clearly articulated.

      Thank you for your valuable feedback.

      (1) Wang, Fang, et al. "Quantized attention-gated kernel reinforcement learning for brain– machine interface decoding." IEEE transactions on neural networks and learning systems 28.4 (2015): 873-886.

      (2) Dyer, Eva L., et al. "A cryptography-based approach for movement decoding." Nature biomedical engineering 1.12 (2017): 967-976.

      (3) Ahmadi, Nur, Timothy G. Constandinou, and Christos-Savvas Bouganis. "Robust and accurate decoding of hand kinematics from entire spiking activity using deep learning." Journal of Neural Engineering 18.2 (2021): 026011.

      (4) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.

      (5) Kaufman, Matthew T., et al. "Cortical activity in the null space: permitting preparation without movement." Nature neuroscience 17.3 (2014): 440-448.

      (6) Elsayed, Gamaleldin F., et al. "Reorganization between preparatory and movement population responses in motor cortex." Nature communications 7.1 (2016): 13239. 

      Q9: “CEBRA itself doesn't provide a neural reconstruction from its embeddings, but one could obtain one via a regression from extracted CEBRA embeddings to neural data. In addition to decoding results of CEBRA (figure S3), the neural reconstruction of CEBRA should be computed and CEBRA should be added to Figure 2 to see how the behaviorally relevant and irrelevant signals from CEBRA compare to other methods.”

      Thank you for your question. Modifying CEBRA is beyond the scope of our work. As CEBRA is not a generative model, it cannot obtain behaviorally relevant and irrelevant signals, and therefore it lacks the results presented in Fig. 2. To avoid the same confusion encountered by reviewers #3 and #4 among our readers, we have opted to exclude the comparison with CEBRA. It is crucial to note, as previously stated, that our assessment of decoding capabilities has been benchmarked against the performance of the ANN on raw signals, which almost represents the upper limit of performance. Consequently, omitting CEBRA does not affect our conclusions.

      Thank you for your valuable feedback.

      Q10: “Line 923: "The optimal hyperparameter is selected based on the lowest averaged loss of five-fold training data." => why is this explained specifically under CEBRA? Isn't the same criteria used for hyperparameters of other methods? If so, clarify.”

      Thank you for your question. The hyperparameter selection for CEBRA follows the practice of the original CEBRA paper. The hyperparameter selection for generative models is detailed in the Section “The strategy for selecting effective behaviorally-relevant signals”.  Thank you for your valuable feedback.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Although the manuscript is well organized and written, it could be largely improved and therefore made more plausible and easier to read. See my point-by-point comments listed below:

      (1) The introduction section is a bit overloaded with some unnecessary information. For example, the authors discussed the relationship between neurotransmitters in the prefrontal and striatum and substance use/sustained attention. However, the results are related to neither the neurotransmitters nor the striatum. In addition, there is a contradictory description about neurotransmitters there, Nicotine/THC leads to increased neurotransmitters, and decreased neurotransmitters is related to poor sustained attention. Does that mean that the use of Nicotine/THC could increase sustained attention?

      Thanks for this insightful question. We understand your concern regarding the seemingly contradictory statements about neurotransmitters and sustained attention. Previous studies have shown that acute administration of nicotine can improve sustained attention (Lawrence et al., 2002; Potter and Newhouse, 2008; Valentine and Sofuoglu, 2018; Young et al., 2004). On the other hand, the acute effects of smoking cannabis on sustained attention are mixed and depend on factors such as dosage and individual differences (Crean et al., 2011). For instance, a previous study (Hart et al., 2001) found that performance on a tracking task, which requires sustained attention, was found to improve significantly after smoking cannabis with a high dose of THC, albeit in experienced cannabis users. However, chronic substance use, including nicotine and cannabis, has been associated with impaired sustained attention (Chamberlain et al., 2012; Dougherty et al., 2013).

      To address your concerns and improve clarity and succinctness of the Introduction, we have removed the description of neurotransmitters from the Introduction. This revision should make the introduction more concise and focus on the direct relationships pertinent to our study.

      (2) It is a bit hard to follow the story for the readers because the Results section went straight into detail. For example, the authors directly introduced that they used the ICV from the Go trials to index sustained attention without basic knowledge about the task. Why use the ICV of Go trials instead of other trials (i.e., successful stop trials) as an index of sustained attention? I suggest presenting the subjects and task details about the data before the detailed behavioral results. The results section should include enough information to understand the presenting results for the readers, rather than forcing the reader to find the answer in the later Methods section.

      We appreciate your suggestion to provide more context about the task and ICV before diving into the detailed behavioural results.

      We used the ICV derived from the Go trials instead of Success stop trials as an index of sustained attention, based on the nature of the stop-signal task and the specific data it generates. Previous studies have indicated that reaction time (RT) variability is a straightforward measure of sustained attention, with increasing variability thought to reflect poorer ability to sustain attention (Esterman and Rothlein, 2019). RT variability is defined as ICV, calculated as the standard deviation of mean Go RT divided by the mean Go RT from Go trials (O'Halloran et al., 2018). The stop signal task includes both Go trials and stop trials. During Go trials, participants are required to respond as quickly and accurately as possible to a Go signal, allowing for the recording of RT for calculating ICV. In contrast, stop trials are designed to measure inhibitory control, where successful response inhibition results in no RT or response recorded in the output. Therefore, Go trials are specifically used to assess sustained attention, while Stop trials primarily assess inhibitory control (Verbruggen et al., 2019).

      We acknowledge the importance of providing this contextual information within the Results section to enhance reader understanding. We have added this information before presenting the behavioural results on Page 6.

      Results

      (1) Behavioural changes over time

      Reaction time (RT) variability is a straightforward measure of sustained attention, with increasing variability thought to reflect poor sustained attention. RT variability is defined as intra-individual coefficient of variation (ICV), calculated as the standard deviation of mean Go RT divided by the mean Go RT from Go trials in the stop signal task. Lower ICV indicates better sustained attention.

      (3) The same problem for section 2 in the Results. What are the predictive networks? Are the predictive networks the same as the networks constructed based on the correlation with ICV? My intuitive feeling is that they are the circular analyses here. The positive/negative/combined networks are calculated based on the correlation between the edges and ICV. Then the author used the network to predict the ICV again. The manipulation from the raw networks (I think they are based on PPI) to the predictive network, and the calculation of the predicted ICV are all missing. The direct exposure of the results to the readers without enough detailed knowledge made everything hard to digest.

      We thank the Reviewer for the insightful comment. We agree with the need for more clarity regarding the predictive networks and the CPM analysis before presenting results. CPM, a data-driven neuroscience approach, is applied to predict individual behaviour from brain functional connectivity (Rosenberg et al., 2016; Shen et al., 2017). The CPM analysis used the strength of the predictive network to predict the individual difference in traits and behaviours. CPM includes several steps: feature selection, feature summarization, model building, and assessment of prediction significance (see Fig. S1).

      During feature selection, we assessed whether connections between brain areas (i.e., edges) in a task-related functional connectivity matrix (derived from general psychophysiological interaction analysis) were positively or negatively correlated with ICV using a significance threshold of P < 0.01. These positively or negatively correlated connections are regarded as positive or negative network, respectively. The network strength of the positive network (or negative network) was determined in each individual by summing the connection strength of each positively (or negatively) correlated edge. The combined network was determined by subtracting the strength of the negative network from the positive network. Next, CPM built a linear model between the network strength of the predictive network and ICV. This model was initially developed using the training set. The predictive networks were then applied to the test set, where network strength was calculated again, and the linear model was used to predict ICV using k-fold cross-validation. Following your advice, we have updated it in the Results section to include these details on Page 7.

      Results

      (2) Cross-sectional brain connectivity

      This study employed CPM, a data-driven neuroscience approach, to identify three predictive networks— positive, negative, and combined— that predict ICV from brain functional connectivity. CPM typically uses the strength of the predictive networks to predict individual differences in traits and behaviors. The predictive networks were obtained based on connectivity analyses of the whole brain. Specifically, we assessed whether connections between brain areas (i.e., edges) in a task-related functional connectivity matrix derived from generalized psychophysiological interaction analysis were positively or negatively correlated with ICV using a significance threshold of P < 0.01. These positively or negatively correlated connections were regarded as positive or negative network, respectively. The network strength of positive networks (or negative networks) was determined for each individual by summing the connection strength of each positively (or negatively) correlated edge. The combined network was determined by subtracting the strength of the negative network from the positive network. We then built a linear model between network strength and ICV in the training set and applied these predictive networks to yield network strength and a linear model in the test set to calculate predicted ICV using k-fold cross validation.

      (4) The authors showed the positive/negative/combined networks from both Go trials and successful stop trials can predict the ICV. I am wondering how the author could validate the specificity of the prediction of these positive/negative/combined networks. For example, how about the networks from the failed stop trials?

      We appreciate the opportunity to clarify the specificity of the predictive networks identified in our study. Here is a more detailed explanation of our findings and their implications.

      To validate the specificity of the sustained attention network identified from CPM analysis, we calculated correlations between the network strength of positive and negative networks and performances from a neuropsychology battery (CANTAB) at each timepoint separately. CANTAB includes several tasks that measure various cognitive functions, such as sustained attention, inhibitory control, impulsivity, and working memory. We found that all positive and negative networks derived from Go and Successful stop trials significantly correlated with a behavioural assay of sustained attention – the rapid visual information processing (RVP) task – at ages 14 and 19 (all P values < 0.028). Age 23 had no RVP task data in the IMAGEN study. There were sporadic significant correlations between constructs such as delay aversion/impulsivity and negative network strength, for example, but the correlations with the RVP were always significant. This demonstrates that the strength of the sustained attention brain network was specifically and robustly correlated with a typical sustained attention task, rather than other cognitive measures. The results are described in the main text on Page 8 and shown in Supplementary materials (Pages 1 and 3) and Table S12.

      In addition, we conducted a CPM analysis to predict ICV using gPPI under Failed stop trials. Our findings showed that positive, negative, and combined networks derived from Failed stop trials significantly predicted ICV: at age 14 (r = 0.10, P = 0.033; r = 0.19, P < 0.001; and r = 0.17, P < 0.001, respectively), at age 19 (r = 0.21; r = 0.18; and r = 0.21, all P < 0.001, respectively), and at age 23 (r = 0.33, r = 0.35, and r = 0.36, respectively, all P < 0.001). Similar results were obtained using a 5-fold CV and leave-site-out CV.

      Our analysis further showed that task-related functional connectivity derived from Go trials, Successful Stop trials, and Failed Stop trials could predict sustained attention across three timepoints. However, the predictive performances of networks derived from Go trials were higher than those from Successful Stop and Failed Stop trials. This suggests that sustained attention is particularly crucial during Go trials when participants need to respond to the Go signal. In contrast, although Successful Stop and Failed Stop trials also require sustained attention, these tasks primarily involve inhibitory control along with sustained attention.

      Taken together, these findings underscore the specificity of the predictive networks of sustained attention. We have updated these results in the Supplementary Materials (Pages 3-5 and Page 7 ):

      Method

      CPM analysis using Failed stop trials

      We performed another CPM analysis using Failed stop trials using gPPI matrix obtained from the second GLM, described in the main text. The CPM analysis was conducted using 10-fold CV, 5-fold CV and leave-site-out CV.

      Results

      CPM predictive performance under Failed stop trials

      Positive, negative, and combined networks derived from Failed stop trials significantly predicted ICV: at age 14 (r = 0.10, P = 0.033; r = 0.19, P < 0.001; and r = 0.17, P < 0.001, respectively), at age 19 (r = 0.21; r = 0.18; and r = 0.21, all P < 0.001, respectively), and at age 23 (r = 0.33, r = 0.35, and r = 0.36, respectively, all P < 0.001). We obtained similar results using a 5-fold CV and leave-site-out CV (Table S6).

      Discussion

      Specificity of the prediction of predictive networks

      We found that task-related function connectivity derived from Go trials, Successful stop trials, and Failed stop trials successfully predicted sustained attention across three timepoints. However, predictive performances of predictive networks derived from Go trials were higher than those derived from Successful stop trials and Failed stop trials. These results suggest that sustained attention is particularly crucial during Go trials when participants need to respond to the Go signal. In contrast, although Successful Stop and Failed Stop trials also require sustained attention, these tasks primarily involve inhibitory control along with sustained attention.

      (5) The author used PPI to define the connectivity of the network. I am not sure why the author used two GLMs for the PPI analysis separately. In the second GLM, Go trials were treated as an implicit baseline. What does this exactly mean? And the gPPI analysis across the entire brain using the Shen atlas is not clear. Normally, as I understand, the PPI/gPPI is conducted to test the task-modulated connectivity between one seed region and the voxels of the whole rest brain. Did the author perform the PPI for each ROI from Shen atlas? More details about how to use PPI to construct the network are required.

      Thank you for your insightful questions. Here, we’d like to clarify how we applied generalized PPI across the whole brain using the Shen atlas and why we used two separate GLMs for the gPPI analysis.

      Yes, PPI is conducted to test the task-modulated connectivity between one seed region and other brain areas. This method can be both voxel-based and ROI-based. In our study, we performed ROI-based gPPI analysis using Shen atlas with 268 regions. Specifically, we performed the PPI on each seed region of interest (ROI) to estimate the task-related FC between this ROI and the remaining ROI (267 regions) under a specific task condition. By performing this analysis across each ROI in the Shen atlas, we generated a 268 × 268 gPPI matrix for each task condition. The matrices were then transposed and averaged with the original matrices, which yielded symmetrical matrices, which were subsequently used for CPM analysis.

      Regarding the use of two separate GLMs for the gPPI analysis, our study aimed to define the task-related FC under two conditions: Go trials and Successful stop trials. The first GLM including Go trials was built to estimate the gPPI during Go trials. However, due to the high frequency of Go trials in the stop signal task, it is common to regard the Go trials as an implicit baseline, as in previous IMAGEN studies (D'Alberto et al., 2018; Whelan et al., 2012). Therefore, to achieve a more accurate estimation of FC during Successful stop trials, we built a second GLM specifically for these trials. Accordingly, we have updated it in the Method Section in the main text on Page 16.

      Method

      2.5 Generalized psychophysiological interaction (gPPI) analysis

      In this study, we adopted gPPI analysis to generate task-related FC matrices and applied CPM analysis to investigate predictive brain networks from adolescents to young adults. PPI analysis describes task-dependent FC between brain regions, traditionally examining connectivity between a seed region of interest (ROI) and the voxels of the whole rest brain. However, this study conducted a generalized PPI analysis, which is on ROI-to-ROI basis (Di et al., 2021), to yield a gPPI matrix across the whole brain instead of just a single seed region.

      Given the high frequency of Go trials in SST, it is common to treat Go trials as an implicit baseline in previous IMAGEN studies (D'Alberto et al., 2018; Whelan et al., 2012). Hence, we built a separate GLM for Successful stop trials, which included two task regressors (Failed and Successful stop trials) and 36 nuisance regressors.

      (6) Why did the author use PPI to construct the network, rather than the other similar methods, for example, beta series correlation (BSC)?

      Thanks for your question. PPI is an approach used to calculate the functional connectivity (FC) under a specific task (i.e., task-related FC). Although most brain connectomic research has utilized resting-state FC (e.g., beta series correlation), FC during task performance has demonstrated superiority in predicting individual behaviours and traits,  due to its potential to capture more behaviourally relevant information (Dhamala et al., 2022; Greene et al., 2018; Yoo et al., 2018). Specifically, Zhao et al. (2023) suggested that task-related FC outperforms both typical task-based and resting-state FC in predicting individual differences. Therefore, we chose to use task-related FC to predict sustained attention over time. We have updated it in the Introduction on Page 5.

      Introduction

      Although most brain connectomic research has utilized resting-state fMRI data, functional connectivity (FC) during task performance has demonstrated superiority in predicting individual behaviours and traits, due to its potential to capture more behaviourally relevant information (Dhamala et al., 2022; Greene et al., 2018; Yoo et al., 2018). Specifically, Zhao et al. (2023) suggested that task-related FC outperforms both typical task-based and resting-state FC in predicting individual differences. Hence, we applied task-related FC to predict sustained attention over time.

      (7) In the section of 'Correlation analysis between the network strength and substance use', the author just described that 'the correlations between xx and xx are shown in Fig5X', and repeated it three times for three correlation results. What exactly are the results? The author should describe the results in detail. And I am wondering whether there are scatter plots for these correlation analyses?

      We’d like to clarify the results in Fig. 5. Fig. 5 illustrates the significant correlations between behaviour and brain activity associated with sustained attention and Cigarette and cannabis use (Cig+CB) after FDR correction. Panel A shows the significant correlation between behaviour level of sustained attention and Cig+CB. Panels B and C show the correlations between brain activity associated with sustained attention and Cig+CB. While Panel B presents the brain activity derived from Go trials, Panel C presents brain activity derived from Successful stop trials. In response to your suggestion, we have described these results in detail on Page 9. We also have included scatter plots for the significant correlations, which are shown in Fig. 5 in Supplementary materials (Fig. S10).

      Results

      (6) Correlation between behaviour and brain to cannabis and cigarette use

      Figs. 5A-C summarizes the results showing the correlation between ICV/brain activity and Cig+CB per timepoint and across timepoints. Fig. 5A shows correlations between ICV and Cig+CB (Tables S14-15). ICV was correlated with Cig+CB at ages 19 (Rho = 0.13, P < 0.001) and 23 (Rho = 0.17, P < 0.001). ICV at ages 14 (Rho = 0.13, P = 0.007) and 19 (Rho = 0.13, P = 0.0003) were correlated with Cig+CB at age 23. Cig+CB at age 19 was correlated with ICV at age 23 (Rho = 0.13, P = 9.38E-05). Fig. 5B shows correlations between brain activity derived from Go trials and Cig+CB (Tables S18-19). Brain activities of positive and negative networks derived from Go trials were correlated with Cig+CB at age 23 (positive network: Rhop = 0.12, P < 0.001; negative network: Rhon = -0.11, P < 0.001). Brain activity of the negative network derived from Go trials at age 14 was correlated with Cig+CB at age 23 (Rhon = -0.16, P = 0.001). Cig+CB at age 19 was correlated with brain activity of the positive network derived from Go trials at age 23 (Rhop = 0.10, P = 0.002). Fig. 5C shows the correlations between brain activity derived from Successful stop and Cig+CB (Tables S18-19). Brain activities of positive and negative networks derived from Successful stop were correlated with Cig+CB at ages 19 (positive network: Rhop = 0.10, P = 0.001; negative network: Rhon = -0.08, P = 0.013) and 23 (positive network: Rhop = 0.13, P < 0.001; negative network: Rhon = -0.11, P = 0.001).

      (8) Lastly, the labels of (A), (B) ... in the figure captions are unclear. The authors should find a better way to place the labels in the caption and keep them consistent throughout all figures.

      Thank you for this valuable comment. We have revised the figure captions in the main text to ensure the labels (A), (B), etc., are placed more clearly and consistently across all figures.

      Reviewer #2 (Public Review):

      While the study largely achieves its aims, several points merit further clarification:

      (1) Regarding connectome-based predictive modeling, an assumption is that connections associated with sustained attention remain consistent across age groups. However, this assumption might be challenged by observed differences in the sustained attention network profile (i.e., connections and related connection strength) across age groups (Figures 2 G-I, Fig. 3 G_I). It's unclear how such differences might impact the prediction results.

      Thank you for your insightful comment. We’d like to clarify that we did not assume that connections associated with sustained attention remain completely consistent across age groups. Indeed, we expected that connections would change across age groups, due to the developmental changes in brain function and structure from adolescence to adulthood. Our focus was on the consistency of individual differences in sustained attention networks over time, recognising that the actual connections within those networks may change. However, we did show that there is some consistency in the specific connections associated with sustained attention over time. Notably, this consistency markedly increases when comparing ages 19 and 23, when developmental factors are less relevant. We support our reasoning above with the following analyses:

      (1) Supplementary materials (Pages 2 and 5), relevant sections highlighted here for emphasis.

      Method

      Comparison of predictive networks identified at one timepoint versus another

      Steiger’s Z value was employed to compare predictive performances of networks identified at different timepoints. This analysis involved comparing the R values derived from networks defined at distinct ages to predict ICV at the same age. For example, we compared the r values of brain networks defined at age 14 when predicting ICV at 19 (i.e., positive network: r = 0.25, negative network: r = 0.25, combined network: r = 0.28) with those R values of brain networks defined at age 19 itself (i.e., positive network: r = 0.16, negative network: r = 0.14, combined network: r = 0.16) derived from Go trials using Steiger's Z test (age 14 → age 19 vs. age 19 → 19). Similarly, comparisons were made between networks defined at age 14 predicting ICV at age 23 and those at age 23 predicting ICV at age 23 (age 14 → age 23 vs. age 23 → 23), as well as between networks defined at age 19 predicting ICV at age 23 and those at age 23 predicting ICV at age 23 (age 19 -> age 23 vs. age 23 -> age 23). These comparisons were performed separately for Go trials and Successful Stop trials.

      Results

      Comparison of predictive performance at different timepoints

      For positive, negative, and combined networks predicting ICV derived from Go trials at age 19, the R values were higher when using predictive networks defined at 19 than those defined at 14 (Z = 3.79, Z = 3.39, Z = 3.99, all P < 0.00071). Similarly, the R values for positive, negative, and combined networks predicting ICV derived from Go trials at age 23 were higher when using predictive networks defined at age 23 compared to those defined at ages 14 (Z = 6.00, Z = 5.96, Z = 6.67, all P < 3.47e-9) or 19 (Z = 2.80, Z = 2.36, Z = 2.57, all P < 0.005).

      At age 19, the R value for the positive network predicting ICV derived from Successful stop trials was higher when using predictive networks defined at 19 compared to those defined at 14 (Z = 1.54, P = 0.022), while the negative and combined networks did not show a significant difference (Z = 0.85, P = 0.398; Z = 2.29, P = 0.123). At age 23, R values for the positive and combined networks predicting ICV derived from Successful stop trials were higher when using predictive networks defined at 23 compared to those defined at 14 (Z = 3.00, Z = 2.48, all P < 3.47e-9) or 19 (Z = 2.52, Z = 1.99, all P < 0.005). However, the R value for the negative network at age 23 did not significantly differ when using predictive networks defined at 14 (Z = 1.80, P = 0.072) or 19 (Z = 1.48, P = 0.138).

      These results indicate that some specific pairwise connections associated with sustained attention at earlier ages, such as 14 and 19, are still relevant as individuals grow older. However, some connections are not optimal for good sustained attention at older ages. That is, the brain reorganizes its connection patterns to maintain optimal functionality for sustained attention as it matures.

      (2) Consistency of Individual Differences:

      We found individual differences in ICV were significantly correlated between the three timepoints (Fig. 1B). In addition, we calculated the correlations of network strength of predictive networks predicting sustained attention derived from Go trials and Successful trials between each timepoints. We found that the correlations of network strength for predictive networks (derived from Go trials and Successful trials) were also significant (all P < 0.003). We have updated these results in the main text (Pages 7-8) and Supplementary Materials (Table S7).

      (2) Cross-sectional brain connectivity

      In addition, we found that network strength of positive, negative, and combined networks derived from Go trials was significantly correlated between the three timepoints (Table S7, all P < 0.003).

      In addition, we found that network strength of positive, negative, and combined networks derived from Successful stop trials was significantly correlated between the three timepoints (Table S7, all P < 0.001).

      (3) Predictive networks across timepoints: Predictive networks defined at age 14 were successfully applied to predict ICV at ages 19 and 23. Similarly, predictive networks defined at age 19 were successfully applied to predict ICV at age 23 (Fig. 4). These results reflect the robustness of the brain network associated with sustained attention over time.

      (4) Dice coefficient analysis: We calculated the Dice coefficient to quantify the similarity of predictive networks across the three timepoints. Connections in the sustained attention networks were significantly similar from ages 14 to 23 (Table S13), despite relatively few overlapping edges over time (as discussed in Supplementary Materials on Page 6).

      (5) Global brain activation: Based on these findings, we indicate that sustained attention relies on global brain activation (i.e., network strength) rather than specific regions or networks (see also (Zhao et al., 2021)).

      In summary, brain network connections undergo change and are not completely consistent across time. However, individual differences in sustained attention and its network are consistent across time, as we found that 1) the brain reorganizes its connection patterns to maintain optimal functionality for sustained attention as it matures. 2) ICV and network strength of sustained attention network were significantly correlated between each timepoint. 3) Sustained attention networks identified from previous timepoints could predict ICV in the subsequent timepoint. 4) Dice coefficient analysis indicated that the edges in the sustained attention networks were significantly similar from ages 14 to 23. 5) Sustained attention networks function as a global activation, rather than specific regions or networks.

      (2) Another assumption of the connectome-based predictive modeling is that the relationship between sustained attention network and substance use is linear and remains linear over development. Such linear evidence from either the literature or their data would be of help.

      Thanks for your valuable suggestion. We'd like to clarify that while CPM assumes a linear relationship between brain and behaviour (Shen et al., 2017), it does not assume that the relationship between the sustained attention network and substance use remains linear over development.

      Our approach in applying CPM to predict sustained attention across different timepoints was based on previous neuroimaging studies (Rosenberg et al., 2016; Rosenberg et al., 2020), which indicated linear associations between brain connectivity patterns and sustained attention using CPM analysis. These findings support the notion of a linear relationship between brain connectivity and sustained attention. In this study, we performed CPM analysis to identify predictive networks predicting sustained attention, not substance use and used the network strength of these predictive networks to represent sustained attention activity.

      To examine the relationship between substance use and sustained attention, as well as its associated brain activity, we conducted correlation analyses and utilized a latent change score model instead of CPM analysis. This decision was informed by cross-sectional studies (Broyd et al., 2016; Lisdahl and Price, 2012) that consistently reported linear associations between substance use and impairments in sustained attention. Additionally, longitudinal research by (Harakeh et al., 2012) indicated a linear relationship between poorer sustained attention and the initiation and escalation of substance use over time.

      Given these previous findings, we assumed a linear relationship between sustained attention and substance use. Our analyses included calculating correlations between substance use and sustained attention, as well as its associated brain activity at each timepoint and across timepoints (Fig. 5). Furthermore, we employed a three-wave bivariable latent change score model, a longitudinal approach, to assess the relationship between substance use and behavirour and brain activity associated with sustained attention (Figs. 6-7). We have added more information in the Introduction to make it more clear on Page 6.

      Introduction

      Additionally, previous cross-sectional and longitudinal studies (Broyd et al., 2016; Harakeh et al., 2012; Lisdahl and Price, 2012) have shown that there are linear relationships between substance use and sustained attention over time. We therefore employed correlation analyses and a latent change score model to estimate the relationship between substance use and both behaviours and brain activity associated with sustained attention.

      (3) Heterogeneity in results suggests individual variability that is not fully captured by group-level analyses. For instance, Figure 1A shows decreasing ICV (better-sustained attention) with age on the group level, while there are both increasing and decreasing patterns on the individual level via visual inspection. Figure 7 demonstrates another example in which the group with a high level of sustained attention has a lower risk of substance use at a later age compared to that in the group with a low level of sustained attention. However, there are individuals in the high sustained attention group who have substance use scores as high as those in the low sustained attention group. This is important to take into consideration and could be a potential future direction for research.

      Thanks for this valuable comment. We appreciate your observation regarding the individual variability that is not fully captured by group-level analyses to some degree. Fig. 1A shows the results from a linear mixed model, which explains group-level changes over time while accounting for the random effect within subjects. Similarly, Fig. 7 shows the group-level association between substance use and sustained attention. We agree that future research could indeed consider individual variability. For example, participants could be categorized based on their consistent trajectories of ICV or substance use (i.e., keep decreasing/increasing) over multiple timepoints. We agree that incorporating individual-level analyses in the future could provide valuable insights and are grateful for your suggestion, which will inform our future research directions.

      The above-mentioned points might partly explain the significant but low correlations between the observed and predicted ICV as shown in Figure 4. Addressing these limitations would help enhance the study's conclusions and guide future research efforts.

      We have updated the text in the Discussion on Page 13:

      Discussion

      However, there are still some individual variabilities not captured in this study, which could be attributed to the diversity in genetic, environmental, and developmental factors influencing sustained attention and substance use. Future research should aim to explore these variabilities in greater depth to gain better understanding of the relationship between sustained attention and substance use.

      Reviewer #3 (Public Review):

      Weaknesses: It's questionable whether the prediction approach (i.e., CPM), even when combined with longitudinal data, can establish causality. I recommend removing the term 'consequence' in the abstract and replacing it with 'predict'. Additionally, the paper could benefit from enhanced rigor through additional analyses, such as testing various thresholds and conducting lagged effect analyses with covariate regression.

      Thank you for your comment. We have replaced “consequence” by “predict” in the abstract.

      Abstract

      Previous studies were predominantly cross-sectional or under-powered and could not indicate if impairment in sustained attention was a predictor of substance-use or a marker of the inclination to engage in such behaviour.

      Reviewer #3 (Recommendations For The Authors):

      (1) The connectivity analysis predicts both baseline and longitudinal attention measures. However, given the high correlation in attention abilities across the three time-points, it's unclear whether the connectivity predicts shared variations of attention across three time points. It would be insightful to assess if predictions at the 2nd and 3rd-time points remained  significant after controlling for attention abilities at the initial time point.

      Thanks for your comments. We performed the CPM analysis to predict ICV at the 2nd and 3rd timepoint, controlling for ICV at age 14 as a covariate. We found that controlling for ICV at age 14, positive, negative, and combined networks derived from Successful stop trials defined at age 14 still predicted ICV at ages 19 and 23. In addition, positive, negative, and combined networks derived from Successful stop trials defined at age 19 predicted ICV at age 23. In addition, positive, negative, and combined networks derived from Go trials defined at age 19 still predicted ICV at age 23, after controlling for ICV at age 14. However, positive, negative, and combined networks derived from Go trials defined at age 14 had lower predictive performances in predicting ICV at ages 19 and 23, after controlling for ICV at age 14. Notably, controlling for ICV at the initial timepoint did not significantly impact the performances of predictive networks derived from Successful stop trials. Accordingly, we have added this analysis and the results in the Supplementary Materials (Pages 3 and 5).

      Method

      Prediction across timepoints controlling for ICV at age 14

      To examine whether connectivity predictors shared variations of sustained attention across timepoints, we applied predictive models developed at ages 14 and 19 to predict ICV at subsequent timepoints controlling for ICV at age 14. Specifically, we used predictive models (including parameters and selected edges) developed at age 14 to predict ICV at ages 19 and 23 separately. First, we calculated the network strength using the gPPI matrix at ages 19 and 23 based on the selected edges identified from CPM analysis at age 14. We then estimated the predicted ICV at ages 19 and 23 by applying the linear model parameters (slope and intercept) obtained from CPM analysis at age 14 to the network strength. Finally, we evaluated the predictive performance by calculating the partial correlation between the predicted and observed values at ages 19 and 23, controlling for ICV at age 14. Similarly, we applied models developed at age 19 to predict ICV at age 23, also controlling for ICV at age 14. To assess the significance of the predictive performance, we used a permutation test, shuffling the predicted ICV values and calculating partial correlation to general a random distribution over 1,000 iterations.

      Results

      Predictions across timepoints controlling for ICV at age 14

      Positive and combined networks derived from Go trials defined at age 14 predicted ICV at ages 19 (r = 0.10, P = 0.028; r = 0.08, P = 0.047) but negative network did not (r = 0.06, P = 0.119). Positive network derived from Go trials defined at age 14 predicted ICV at age 23 (r = 0.11, P = 0.013) but negative and combined networks did not (r = 0.04, P = 0.187; r = 0.08, P = 0.056).  Positive, negative, and combined networks derived from Go trials defined at age 19 predicted ICV at age 23 (r = 0.22, r = 0.19, and r = 0.22, respectively, all P < 0.001).

      Positive, negative, and combined networks derived from Successful stop trials defined at age 14 predicted ICV at age 19 (r = 0.08, P = 0.036; r = 0.10, P = 0.012; r = 0.11, P = 0.009) and 23 (r = 0.11, P = 0.005; r = 0.13, P = 0.005; r = 0.13, P = 0.017) respectively. Positive, negative, and combined networks derived from Successful stop trials defined at age 19 predicted ICV at age 23 (r = 0.18, r = 0.18, and r = 0.17, respectively, all P < 0.001).

      (2) In the Results section, a significance threshold of p = 0.01 was used for the CPM analysis. It would be beneficial to test the stability of these findings using alternative thresholds such as p = 0.05 or p = 0.005.

      We appreciate this insightful comment. We appreciate the suggestion to test the stability of our findings using alternative significance thresholds. Indeed, we have already conducted CPM analyses using a range of thresholds, including 0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, and 0.0001 (see Table S8 in supplementary Materials). The results were similar across different thresholds. Following prior studies (Feng et al., 2024; Ren et al., 2021; Yoo et al., 2018) which used P < 0.01 for feature selection, we chose to focus on the threshold of P < 0.01 for our main analysis. Following your suggestion, we have highlighted this in the Method section on Pages 17-18.

      Method

      2.6.1 ICV prediction

      The r value with an associated P value for each edge was obtained, and a threshold P = 0.01 (Feng et al., 2024; Ren et al., 2021; Yoo et al., 2018) was set to select edges.

      2.6.2 Three cross-validation schemes

      In addition, we conducted the CPM analysis using a range of thresholds for feature selection and observed similar results across different thresholds (See Supplementary Materials Table S8).

      (3) Could you clarify if you used one sub-sample to extract connectivity related to sustained attention and then used another sub-sample to predict substance use with attention-related connectivity?

      Thank you very much for the question. We used the same sample to extract the brain network strength and estimated the correlation with substance use using both the Spearman correlation and latent change score model across three timepoints. We controlled for covariates including sex, age, and scan site at the same time. Accordingly, we have clarified this in the Method section on Page 20. We note that the CPM analyses were conducted using cross-validation, plus a leave-site-out analysis.

      Method

      2.7.3 Correlation between network strength and substance use

      It is worth noting that all the correlations between substance use and sustained attention were conducted using the same sample across three timepoints.

      (4) Could you clarify whether you have regressed covariates in the lagged effects analysis of part 7?

      Thanks for this question. Yes, we confirmed that we controlled the covariates including age, sex and scan sites in the latent change score model. We have described them more clearly now in the Method section (Page 18).

      Method

      2.7.3 Correlation between network strength and substance use

      Additionally, cross-lagged dynamic coupling (i.e., bidirectionality) was employed to explore individual differences in the relationships between substance use and linear changes in ICV/brain activity, as well as the relationship between ICV/brain activity and linear change in substance use. The model accounted for covariates such as age, sex and scan sites.

      References:

      Broyd, S.J., van Hell, H.H., Beale, C., Yucel, M., Solowij, N., 2016. Acute and Chronic Effects of Cannabinoids on Human Cognition-A Systematic Review. Biol Psychiatry 79, 557-567.

      Chamberlain, S.R., Odlaug, B.L., Schreiber, L.R.N., Grant, J.E., 2012. Association between Tobacco Smoking and Cognitive Functioning in Young Adults. The American Journal on Addictions 21, S14-S19.

      Crean, R.D., Crane, N.A., Mason, B.J., 2011. An evidence based review of acute and long-term effects of cannabis use on executive cognitive functions. J Addict Med 5, 1-8.

      D'Alberto, N., Chaarani, B., Orr, C.A., Spechler, P.A., Albaugh, M.D., Allgaier, N., Wonnell, A., Banaschewski, T., Bokde, A.L.W., Bromberg, U., Buchel, C., Quinlan, E.B., Conrod, P.J., Desrivieres, S., Flor, H., Frohner, J.H., Frouin, V., Gowland, P., Heinz, A., Itterman, B., Martinot, J.L., Paillere Martinot, M.L., Artiges, E., Nees, F., Papadopoulos Orfanos, D., Poustka, L., Robbins, T.W., Smolka, M.N., Walter, H., Whelan, R., Schumann, G., Potter, A.S., Garavan, H., 2018. Individual differences in stop-related activity are inflated by the adaptive algorithm in the stop signal task. Hum Brain Mapp 39, 3263-3276.

      Dhamala, E., Yeo, B.T.T., Holmes, A.J., 2022. Methodological Considerations for Brain-Based Predictive Modelling in Psychiatry. Biological Psychiatry.

      Di, X., Zhang, Z.G., Biswal, B.B., 2021. Understanding psychophysiological interaction and its relations to beta series correlation. Brain Imaging and Behavior 15, 958-973.

      Dougherty, D.M., Mathias, C.W., Dawes, M.A., Furr, R.M., Charles, N.E., Liguori, A., Shannon, E.E., Acheson, A., 2013. Impulsivity, attention, memory, and decision-making among adolescent marijuana users. Psychopharmacology (Berl) 226, 307-319.

      Esterman, M., Rothlein, D., 2019. Models of sustained attention. Curr Opin Psychol 29, 174-180.

      Feng, Q., Ren, Z., Wei, D., Liu, C., Wang, X., Li, X., Tie, B., Tang, S., Qiu, J., 2024. Connectome-based predictive modeling of Internet addiction symptomatology. Soc Cogn Affect Neurosci 19.

      Greene, A.S., Gao, S., Scheinost, D., Constable, R.T., 2018. Task-induced brain state manipulation improves prediction of individual traits. Nature Communications 9, 2807.

      Harakeh, Z., de Sonneville, L., van den Eijnden, R.J., Huizink, A.C., Reijneveld, S.A., Ormel, J., Verhulst, F.C., Monshouwer, K., Vollebergh, W.A., 2012. The association between neurocognitive functioning and smoking in adolescence: the TRAILS study. Neuropsychology 26, 541-550.

      Hart, C.L., van Gorp, W., Haney, M., Foltin, R.W., Fischman, M.W., 2001. =. Neuropsychopharmacology 25, 757-765.

      Lawrence, N.S., Ross, T.J., Stein, E.A., 2002. Cognitive mechanisms of nicotine on visual attention. Neuron 36, 539-548.

      Lisdahl, K.M., Price, J.S., 2012. Increased marijuana use and gender predict poorer cognitive functioning in adolescents and emerging adults. J Int Neuropsychol Soc 18, 678-688.

      O'Halloran, L., Cao, Z.P., Ruddy, K., Jollans, L., Albaugh, M.D., Aleni, A., Potter, A.S., Vahey, N., Banaschewski, T., Hohmann, S., Bokde, A.L.W., Bromberg, U., Buchel, C., Quinlan, E.B., Desrivieres, S., Flor, H., Frouin, V., Gowland, P., Heinz, A., Ittermann, B., Nees, F., Orfanos, D.P., Paus, T., Smolka, M.N., Walter, H., Schumann, G., Garavan, H., Kelly, C., Whelan, R., 2018. Neural circuitry underlying sustained attention in healthy adolescents and in ADHD symptomatology. Neuroimage 169, 395-406.

      Potter, A.S., Newhouse, P.A., 2008. Acute nicotine improves cognitive deficits in young adults with attention-deficit/hyperactivity disorder. Pharmacol Biochem Behav 88, 407-417.

      Ren, Z., Daker, R.J., Shi, L., Sun, J., Beaty, R.E., Wu, X., Chen, Q., Yang, W., Lyons, I.M., Green, A.E., Qiu, J., 2021. Connectome-Based Predictive Modeling of Creativity Anxiety. Neuroimage 225, 117469.

      Rosenberg, M.D., Finn, E.S., Scheinost, D., Papademetris, X., Shen, X., Constable, R.T., Chun, M.M., 2016. A neuromarker of sustained attention from whole-brain functional connectivity. Nat Neurosci 19, 165-171.

      Rosenberg, M.D., Scheinost, D., Greene, A.S., Avery, E.W., Kwon, Y.H., Finn, E.S., Ramani, R., Qiu, M., Constable, R.T., Chun, M.M., 2020. Functional connectivity predicts changes in attention observed across minutes, days, and months. Proc Natl Acad Sci U S A 117, 3797-3807.

      Shen, X., Finn, E.S., Scheinost, D., Rosenberg, M.D., Chun, M.M., Papademetris, X., Constable, R.T., 2017. Using connectome-based predictive modeling to predict individual behavior from brain connectivity. Nat Protoc 12, 506-518.

      Valentine, G., Sofuoglu, M., 2018. Cognitive Effects of Nicotine: Recent Progress. Curr Neuropharmacol 16, 403-414.

      Verbruggen, F., Aron, A.R., Band, G.P.H., Beste, C., Bissett, P.G., Brockett, A.T., Brown, J.W., Chamberlain, S.R., Chambers, C.D., Colonius, H., Colzato, L.S., Corneil, B.D., Coxon, J.P., Dupuis, A., Eagle, D.M., Garavan, H., Greenhouse, I., Heathcote, A., Huster, R.J., Jahfari, S., Kenemans, J.L., Leunissen, I., Li, C.S.R., Logan, G.D., Matzke, D., Morein-Zamir, S., Murthy, A., Pare, M., Poldrack, R.A., Ridderinkhof, K.R., Robbins, T.W., Roesch, M.R., Rubia, K., Schachar, R.J., Schall, J.D., Stock, A.K., Swann, N.C., Thakkar, K.N., van der Molen, M.W., Vermeylen, L., Vink, M., Wessel, J.R., Whelan, R., Zandbelt, B.B., Boehler, C.N., 2019. A consensus guide to capturing the ability to inhibit actions and impulsive behaviors in the stop-signal task. Elife 8.

      Whelan, R., Conrod, P.J., Poline, J.B., Lourdusamy, A., Banaschewski, T., Barker, G.J., Bellgrove, M.A., Buchel, C., Byrne, M., Cummins, T.D., Fauth-Buhler, M., Flor, H., Gallinat, J., Heinz, A., Ittermann, B., Mann, K., Martinot, J.L., Lalor, E.C., Lathrop, M., Loth, E., Nees, F., Paus, T., Rietschel, M., Smolka, M.N., Spanagel, R., Stephens, D.N., Struve, M., Thyreau, B., Vollstaedt-Klein, S., Robbins, T.W., Schumann, G., Garavan, H., Consortium, I., 2012. Adolescent impulsivity phenotypes characterized by distinct brain networks. Nat Neurosci 15, 920-925.

      Yoo, K., Rosenberg, M.D., Hsu, W.T., Zhang, S., Li, C.R., Scheinost, D., Constable, R.T., Chun, M.M., 2018. Connectome-based predictive modeling of attention: Comparing different functional connectivity features and prediction methods across datasets. Neuroimage 167, 11-22.

      Young, J.W., Finlayson, K., Spratt, C., Marston, H.M., Crawford, N., Kelly, J.S., Sharkey, J., 2004. Nicotine improves sustained attention in mice: evidence for involvement of the alpha7 nicotinic acetylcholine receptor. Neuropsychopharmacology 29, 891-900.

      Zhao, W., Makowski, C., Hagler, D.J., Garavan, H.P., Thompson, W.K., Greene, D.J., Jernigan, T.L., Dale, A.M., 2023. Task fMRI paradigms may capture more behaviorally relevant information than resting-state functional connectivity. Neuroimage, 119946.

      Zhao, W., Palmer, C.E., Thompson, W.K., Chaarani, B., Garavan, H.P., Casey, B.J., Jernigan, T.L., Dale, A.M., Fan, C.C., 2021. Individual Differences in Cognitive Performance Are Better Predicted by Global Rather Than Localized BOLD Activity Patterns Across the Cortex. Cereb Cortex 31, 1478-1488.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):

      Summary:

      This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though I have substantive concerns about how this analysis was performed and as such will not summarize the results. Broadly, the behavioural and univariate findings are consistent with the idea that memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths:

      The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.

      Weaknesses:

      As noted above, the pattern similarity analysis for both item and category-level reinstatement was performed in a way that is not interpretable given concerns about temporal autocorrelation within the scanning run. Below, I focus my review on this analytic issue, though I also outline additional concerns.

      We thank the reviewer for both the positive and critical appraisal of our paper.

      (1) The pattern similarity analyses were not done correctly, rendering the results uninterpretable (assuming my understanding of the authors' approach is correct).

      a. First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within the scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, but I don't believe this is possible unfortunately given the authors' design; I believe the target (presumably reinstated) scene only appears once during scanning, so there is no separate neural pattern during the presentation of this picture that they can use. For these reasons, any evidence for "significant scene-specific reinstatement" and the like is completely uninterpretable and would need to be removed from the paper.

      We thank the reviewer for this important input. We acknowledge that our study design leads to temporal autocorrelation in the BOLD signal when calculating RSA between fixation and scene time windows. We also recognize that we cannot interpret the significance of scene-specific reinstatement compared to zero and have accordingly removed this information. Nevertheless, our primary objective was to investigate changes in scene-specific reinstatement in relation to the different time delays of retrieval. Given that the retrieval procedure is the same over time and presumably similarly influenced by temporal autocorrelations, we argue that our results must be attributed to the relative differences in reinstatement across recent and remote trials. Bearing this in mind, we argue that our results can be interpreted in terms of delay-related changes in reinstatement. This information is discussed in pp. 21, 40 of the manuscript.

      We agree with the reviewer that cross-run comparisons would be extremely interesting. This could be achieved by introducing the same items repeatedly across different runs, which was not possible in our current setup since we were interested in single exposure retrieval and practical time restriction in scanning children. We have  introduced this idea in Limitations and Discussion sections (pp. 40, 44) of the manuscript to inform future studies.

      Finally, thanks to the reviewer’s comment, we identified a bug in the final steps of our RSA calculation. Fischer’s z-transformation was incorrectly applied to r-1 values, resulting in abnormally high values. We apologize for this error. We have revised the scripts and rectified the bug by correctly applying Fischer’s z-transformation to the r similarity values. We also adjusted the methods description figure accordingly (Figure 5, p. 22). This adjustment led to slightly altered reinstatement indices. Nevertheless, the overall pattern of delay-related attenuation in the scene-specific reinstatement index, observed in both children and adults, remains consistent. Similarly, we observed gist-like reinstatement uniquely in children.

      b. From a theoretical standpoint, I believe the way this analysis was performed considering the fixation and the immediately following scene also means that the differences between recent and remote could have to do with either the reactivation (processes happening during the fixation, presumably) or differences in the processing of the stimulus itself (happening during the scene presentation). For example, people might be more engaged with the more novel scenes (recent) and therefore process those scenes more; such a difference would be interpreted in this analysis as having to do with reinstatement, but in fact could be just related to the differential scene processing/recognition, etc.

      Thank you for your insightful comments. We acknowledge the theoretical concerns raised about distinguishing between the effects of reactivation processes occurring during fixation and differential processing of the stimulus itself during scene presentation. Specifically, the notion that engagement levels with recent scenes could result in enhanced processing, which might be misattributed to memory reinstatement mechanisms.

      We argue, however, that during scene presentation, scenes are processed more “memory-wise” rather than “perception-wise”, since both recent and remote memories are well-learned, as we included only correctly recalled memories in the analysis.

      We concur that scene presentations entail perceptual processing; however, such processing would be consistent across all items, given that they were presented with the same repeated learning procedure, rendering them equally familiar to participants. In addition, we would argue that distinct activation patterns elicited during varying delays are more likely attributable to memory-related processing, since participants actively engaged in a memory-based decision-making task during these intervals. We have incorporated this rationale into the discussion section of our manuscript (p. 40).

      With this in mind, we hypothesized that in case of “memory-wise” processing, the neural engagement during the scene time window should be higher for remote compared to recent  items, and this increases with passing time as more control and effort should be exhibited during retrieval due to reorganized and distributed nature of memories. If the scenes are processed more “perception-wise”, we would expect higher neural engagement during the retrieval of recent compared to remote items. Our exploratory analysis (detailed overview in supplementary materials, Figure S3, Table S9) revealed a higher neural activation for remote compared to recent items in medial temporal, prefrontal, occipital and cerebellar brain regions, supporting the notion of “memory-wise” processes during scene time window. However, this exploratory analysis cannot provide a direct solution to the reviewer’s concern as our paradigm per se cannot arbitrate between “memory-wise” and “perception-wise” nature of retrieval. We added the point to the discussion (see p. 40).

      c. For the category-based neural reinstatement:

      (1) This suffers from the same issue of correlations being performed within the run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). With this restriction, it may or may not be possible to perform this analysis, depending upon how the same-category scenes are distributed across runs. However, there are other issues with this analysis, as well.

      (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. The authors do not motivate the reason for this switch. Please provide reasoning as to why fixation-fixation is more appropriate than fixation-scene similarity for category-level reinstatement, particularly given the opposite was used for item-level reinstatement. Even if the analyses were done properly, it would remain hard to compare them given this difference in approach.

      (3) I believe the fixation cross with itself is included in the "within category" score  Is this not a single neural pattern correlated with itself, which will yield maximal similarity (pearson r=1) or minimal dissimilarity (1-pearson r=0)? Including these comparisons in the averages for the within-category score will inflate the difference between the "within-category" and "between-category" comparisons. These (e.g., forest1-forest1) should not be included in the within-category comparisons considered; rather, they should be excluded, so the fixations are always different but sometimes the comparisons are two retrievals of the same scene type (forest1-forest2), and other times different scene types (forest1-field1)

      (4) It is troubling that the results from the category reinstatement metric do not seem to conceptually align with past work; for example, a lot of work has shown category-level reinstatement in adults. Here the authors do not show any category-level reinstatement in adults (yet they do in children), which generally seems extremely unexpected given past work and I would guess has to do with the operationalization of the metric.

      Thank you for this important input regarding category-based reinstatement.

      (1) The distribution of within-category items across runs was approximately similar and balanced. Additionally, within runs, they were presented randomly without close temporal proximity. Based on this arrangement, we believe that the issue of close temporal autocorrelation, as pointed out by the reviewer in the context of scene-specific reinstatement, may not apply to the same extent here. Again, our focus is not on the absolute level of category-based reinstatement, but the relative difference across conditions (recent vs. remote short delay vs. remote long delay) which are equally impacted by the autocorrelations.  

      (2) We apologize for not motivating this analysis further. Whereas the scene-reinstatement index (i.e., fixation to scene correlation) gives us a measure of the pre-activation of a concrete scene (e.g., a yellow forest in autumn), the gist-like reinstatement gives us a measure of the pre-activation of a whole category of scenes (e.g., forests). Critically, our window of interest is the fixation period for both sets of analysis (in the absence of any significant visual input). The scene-specific reinstatement uses the scene window as a neural template against which the fixation period can be compared, while the gist-like reinstatement compares similarity of reactivation pattern for trials from the same category but differ in the exact memory content. The reinstatement of more generic, gist-like memory (e.g., forest) across multiple trials should yield more similar neural activation patterns. Significant gist-like reinstatement would suggest that neural patterns for scenes within the same category are more generic, as indicated by higher similarity among them. On the other hand, a more detailed reinstatement of specific types of forests (e.g., a yellow forest in autumn, green pine trees, a bare-leaved forest in spring, etc.) that differ in various dimensions could result in neural activation patterns that are as dissimilar as those seen in the reinstatement of scenes from entirely different categories. Through this methodology, we could distinguish between more generic, gist-like reinstatement and more specific, detailed reinstatement. This is now clarified in the manuscript, see p. 25.

      (3) We apologize for the confusion caused by the figure and analysis description. In our analysis, we indeed excluded the correlation of the fixation cross with itself. Consequently, the diagonal in the figure should be blank to indicate this. This is now revised in the manuscript (Figure 7B and in Methods).

      (4) We appreciate your concern and recognize that the terminology we used might not align perfectly with the conventional understanding of category-based reinstatement. Typically, category-level neural representations (as discussed in Polyn et al., 2005; Jafarpour et al., 2014; among others) are investigated to identify specific brain areas associated with encoding/perception of scenes or faces. Our aim, however, was to explore the mnemonic reinstatement of highly detailed scenes that were elaborately encoded, with the hypothesis that substantial representational transformations would occur over time and vary with age. This hypothesis is based on the memory literature, including the Fuzzy-Trace Theory, the Contextual Binding Theory, and the Trace Transformation Theory (Brainerd & Reyna, 1998; Yonelinas, 2019; Moscovitch & Gilboa, 2023). Therefore, we renamed 'category-based' reinstatement to 'gist-like' reinstatement, which clarifies our concept and better aligns it with existing literature.

      We anticipated that young adults, having the ability to retain detailed narratives post-encoding, would demonstrate a reinstatement of scenes with distinct details, making these scenes dissimilar from each other (see similar findings in Sommer et al., 2021). In contrast, given the anticipated lesser strategic elaboration during learning in children, we hypothesized that they would demonstrate a shallower, more gist-like reinstatement (for instance, children recalling a forest or a field in a general sense without specific details or vivid imagery). This could result in higher category-based similarity, as children might reinstate a more generic forest concept.

      We did not gather additional data on the verbal quality of reinstatement due to the limited scanning time available for children, so these assumptions remain unverified. However, anecdotal observations post-retrieval indicated that adults often reported very vivid scenes associated with clear narrative recall. In contrast, children frequently described more vague memories (e.g., “I know it was a forest”) without specific details. Future studies should include measures to assess the quality of reinstatement, potentially outside the scanning environment.

      (2) I did not see any compelling statistical evidence for the claim of less robust consolidation in children.

      Specifically in terms of the behavioral results of retention of the remote items at 1 vs 14 days, shown in Figure 2B, the authors conclude that memory consolidation is less robust in children (line 246). Yet they do not report statistical evidence for this point, as there was no interaction of this effect with the age group. Children had worse memory than adults overall (in terms of a main effect - i.e. across recent and remote items). If it were consolidation-specific, one would expect that the age differences are bigger for the remote items, and perhaps even most exaggerated for the 14-day-old memories. Yet this does not appear to be the case based on the data the authors report. Therefore, the behavioral differences in retention do not seem to be consolidation specific, and therefore might have more to do with differences in encoding fidelity or retrieval processes more generally across the groups. This should be considered when interpreting the findings.

      Thank you for highlighting this important issue. We acknowledge that our initial description and depiction of our behavioral findings may not have effectively conveyed the main message about memory consolidation. Therefore, we have revised the behavioral results section (see pp. 12-14) to communicate our message more clearly.

      As detailed in the methods section, we reported retention rates only for those items that were correctly (100%) learned on day 0, day 1, and day 14. This approach meant that different participants had varying numbers of items learned correctly. However, this strategy allowed us to address our primary question: whether memory consolidation, based on all items initially encoded successfully, is comparably robust between groups.

      To illustrate the change in retention rate slopes over time for recently learned items (i.e., immediately 30 minutes after learning), short delay remote, and long delay remote items, relative to the initially correctly learned items more clearly and straightforward, we conducted the following analysis: after observing no differences between sessions in both age groups for recent items on days 1 and 14, we combined the recent items. This approach enabled us to investigate how the slope of memory retention for initially correctly learned items (with a baseline of 100%) changes over time. We observed a significant interaction between item type (recent, short delay remote, long delay remote) and group (F(3,250) = 17.35, p < .001, w2 = .16). The follow up of this interaction revealed significantly less robust memory consolidation across all delay times in children compared to young adults. This information is added in the manuscript in pp. 12-14. We have also updated the figures, incorporating the baseline of 100% correct performance.

      (3) Please clarify which analyses were restricted to correct retrievals only. The univariate analyses states that correct and incorrect trials were modelled separately but does not say which were considered in the main contrast (I assume correct only?). The item specific reinstatement analysis states that only correct trials were considered, but the category-level reinstatement analysis does not say. Please include this detail.

      Thank you for bringing this to our attention. We indeed limited our analysis – including univariate, specific reinstatement, and gist-like analyses – to only correctly remembered items. This decision was made because our goal was to observe delay-related changes in the neural correlates of correct memories, which are potentially stronger. We have incorporated this information into the manuscript.

      (4) To what extent could performance differences be impacting the differences observed across age groups? I think (see prior comment) that the analyses were probably limited to correct trials, which is helpful, but still yields pretty big differences across groups in terms of the amount of data going into each analysis. In general, children showed more attenuated neural effects (e.g., recent/remote or session effects); could this be explained by their weaker memory? Specifically, if only correct trials are considered that means that fewer trials would be going into the analysis for kids, especially for the 14-day remote memories, and perhaps pushing the remove > recent difference for this condition towards 0. The authors might be able to address this analytically; for example, does the remote > recent difference in the univariate data at day 14 correlate with day 14 memory?

      Thank you for pointing this out. Indeed, there was a significant relationship between remote > recent difference in the univariate data and memory performance at day 14 across both age group (see Figure 4C-D). The performance of all participants including children was above chance level for remote trial on day 14. In addition, although number of remote trials was lower in children (18 trials on average) in comparison to adults (22 trials on average), we believe that the number of remote trials was not too low or different across groups for the contrast.

      (5) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example, in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). This difference from zero or lack thereof seems important to the message - is that correct? If so, can the authors incorporate descriptions of these findings?

      Thank you for this valuable input. When examining recent and remote retrieval separately, indeed both the anterior and posterior regions of the hippocampus exhibited significant activation from zero in adults (all p < .0003FDRcorr) and children (all p < .014FDRcorr, except for recent posterior hippocampus) during all delays. We include this information in the manuscript (see p. 17) and add it to the supplementary materials (Figure S2, Table S7).

      (6) Please provide more details about the choices available for locations in the 3AFC task. (1) Were they different each time, or always the same? If they are always the same, could this be a motor or stimulus/response learning task? (2) Do the options in the 3AFC always come from the same area - in which case the participant is given a clue as to the gist of the location/memory? Or are they sometimes randomly scattered across the image (in which case gist memory, like at a delay, would be sufficient for picking the right option)? Please clarify these points and discuss the logic/impact of these choices on the interpretation of the results.Response: Thank you for pointing this out. During learning and retrieval, we employed the 3AFC (Three-Alternative Forced Choice) task.

      The choices for locations varied across scenes while remained the same across time within individuals. There were 18 different key locations for the objects, distributed across the stimulus set. This means the locations of the objects were quite heterogeneous and differed between objects. The location of the object within the task was presented once during encoding and remained consistent throughout learning. Given the location heterogeneity, we believe our task cannot be reduced to a mere “stimulus/response learning task” but is more accurately described as an object-location associations task.

      Similar to the previous description, the options for the 3AFC task did not originate from the same area, as there were 18 different areas in total. The three choice options were distributed equally: so sometimes the “correct” answer was the left option, sometimes in the middle option, or sometimes the right option. Therefore, we believe that the 3AFC task did not provide clues to the location but required detailed and precise memory of the location. Moreover, the options were not randomly scattered but rather presented close together in the scene, demanding a high level of differentiation between choices.

      Taking all the above into consideration, we assert that precise object-location associative memory is necessary for a correct answer. We have added this information to the manuscript (p. 9).

      (7) Often p values are provided but test statistics, effect sizes, etc. are not - please include this information. It is at times hard to tell whether the authors are reporting main effects, interactions, pairwise comparisons, etc.

      Thank you for bringing this to our attention. We realize that including this information in the Tables may not be the most straightforward approach. Therefore, we have incorporated the test statistics, effect sizes, and related details into the text of the results section for clarity.

      (8) There are not enough methodological details in the main paper to make sense of the results. For example, it is not clear from reading the text that there are new object-location pairs learned each day.

      Thank you for pointing this out. We have added this information to the main manuscript. Additionally, we have emphasized this information in the text referring to Figure 1B.

      (9) The retrieval task does not seem to require retrieval of the scene itself, and as such it would be helpful for the authors to both explain their reasoning for this task to measure reinstatement. Strictly speaking, participants could just remember the location of the object on the screen. Was it verified that children and adults were recalling the actual scene rather than just the location (e.g. via self-report)? It's possible that there may be developmental differences in the tendency to reinstate the scene depending on e.g., their strategy.

      Thank you for highlighting this important point. Indeed, the retrieval task included explicit instructions for participants to recall and visualize the scene associated with the object presented during the fixation time window. Participants were also instructed to recollect the location of the object within the scene. Since the location was contextually bound to the scene and each object had a unique location in each scene, the location of the object was always embedded in the specific scene context. We have added this information to both the Methods and Results sections.

      From the self-reports of the participants (which unfortunately were not systematically collected on all occasions), they indicated that when they could recall the scene and the location due to the memory of stories created during strategic encoding, it aided their memory for the scene and location immensely. We also concur with your observation that children and young adults may differ in their ability to reinstate scenes, depending on the success of their employed recall strategies. This task was conducted with an awareness of potential developmental differences in the ability to form complex contextual memories. Our elaborative learning procedure was designed to minimize these differences. It is important to note though we did not expect children to achieve performance levels fully comparable to adults. There may indeed be developmental differences in reinstatement, such as due to differences in knowledge availability and accessibility (Brod, Werkle-Bergner, & Shing, 2013). We think that these differences may underlie our findings of neural reinstatement. This is now discussed in p. 34-35, 39-43 of the manuscript.

      (10) In general I found the Introduction a bit difficult to follow. Below are a few specific questions I had.

      a. At points findings are presented but the broader picture or take-home point is not expressed directly. For example, lines 112-127, these findings can all be conceptualized within many theories of consolidation, and yet those overarching frameworks are not directly discussed (e.g., that memory traces go from being more reliant on the hippocampus to more on the neocortex). Making these connections directly would likely be helpful for many readers.

      Thank you for bringing this to our attention. We have incorporated a summary of the general frameworks of memory consolidation into the introduction. This addition outlines how our summarized findings, particularly those related to memory consolidation for repeatedly learned information, align with these frameworks (see lines 126-138, 146-150).

      b. Lines 143-153 - The comparison of the Tompary & Davachi (2017) paper with the Oedekoven et al. (2017) reads like the two analyses are directly comparable, but the authors were looking at different things. The Tompary paper is looking at organization (not reinstatement); while the Oedekoven et al. paper is measuring reinstatement (not organization). The authors should clarify how to reconcile these findings.

      Thank you for highlighting this aspect. We have revised how we present the results from Tompary & Davachi (2017). This study examined memory reorganization for memories both with and without overlapping features, and it observed higher neural similarity for memories with overlapping features over time. The authors also explored item-specific reinstatement for recent and remote memories by assessing encoding-retrieval similarity. Since Oedekoven et al. (2017) utilized a similar approach, their results are comparable in terms of reinstatement. We have updated and expanded our manuscript to clarify the parallels between these studies (see lines 157-162).

      c. Line 195-6: I was confused by the prediction of "stable involvement of HC over time" given the work reviewed in the Introduction that HC contribution to memory tends to decrease with consolidation. Please clarify or rephrase.

      Drawing on the Contextual Binding Theory (Yonelinas et al., 2019), as well as the Multiple Trace Theory (Nadel et al., 2000) and supported for instance by evidence from Sekeres et al. (2018), we hypothesized that detailed contextual memories formed through repeated and strategic learning would strengthen the specificity of these memories, resulting in consistent hippocampal involvement for successfully recalled contextualized detailed memories. We have included additional explanatory information in the manuscript to clarify this hypothesis (see lines 217-219).

      d. Lines 200-202: I was a bit confused about this prediction. Firstly, please clarify whether immediate reinstatement has been characterized in this way for kids versus adults. Secondly, don't adults retain gist more over long delays (with specific information getting lost), at least behaviourally? This prediction seems to go against that; please clarify.

      Thank you for raising this important point. Indeed, there are no prior studies that examined memory reinstatement over extended durations in children. The primary existing evidence suggests that neural specificity or patterns of neural representations in children can be robustly observed, while neural selectivity or univariate activation in response to the same stimuli tends to mature later (i.e., Fandakova et al., 2019). Bearing this in mind and recognizing that such neural patterns can be observed in both children and adults, we hypothesized that adults may form stronger detailed contextual memories compared to children. By employing strategies such as creating stories, adults might more easily recall scenes without the need to resort to forming generic or gist-like memories (for example, 'a red fox was near the second left pine tree in a spring green forest'). This assumption aligns with the Fuzzy Trace Theory (Reyna & Brainerd, 1995), which posits that verbatim memories can be created without the extraction of a gist.

      Conversely, we hypothesized that children, due to the ongoing maturation of associative and strategic memory components (as discussed in Shing et al., 2008 and 2010), which are dependent respectively on the hippocampus (HC) and the prefrontal cortex (PFC), would be less adept at creating, retaining, and extracting stories to aid their retrieval process. This could result in them remembering more generic integrated information, like the relationship between a fox and some generic image of a forest. We have added explanatory information to the manuscript to elucidate these points (see lines 225-230).

      Reviewer #1 (Recommendations For The Authors):

      (1) For Figure 3, I would highly recommend changing the aesthetics for the univariate data - at least on my screen they appear to be open boxes with solid vs. dashed lines, and as such look identical to the recent vs. remove distinction in Figure 2B. It also doesn't match the legend for me, which shows the age groups having purple vs. yellow coloring.

      Thank you for this observation. We have adjusted Figure 2 (now Figure 3) (please refer to p. 14) accordingly, now utilizing purple and yellow colors to distinguish between the age groups.

      (2) Lines 329-330, it is not true that "all" indices were significant from zero but this is only apparent if you read the next sentence. Please rephrase to clarify. e.g., "All ... indices with a few exceptions ... were significantly..."?

      Based on the above suggestions and considering our primary focus on time-related changes in scene-specific reinstatement, we will refrain from further interpreting the relative expression of individual scene-specific indices against 0. Consequently, we have removed this information from our analysis.

      (3) It is challenging to interpret some of the significance markers, such as those in Figure 3. For example what effects are being denoted by the asterisks and bars above vs. below the data on panel D? Please clarify and/or note in the legend.

      We have included a note in the legend to clarify the meaning of all significance markers. In addition, we decided to state any significant main and interaction effects in the figure rather that to use significance markers.

      (4) For Figures 2 and 3, only the meaning of error bars is described in the caption. It is not explained in the caption what the boxes, lines, and points denote. Please clarify.

      Thank you for highlighting this. We have added explanations to the figure's annotation for clarity. Please note, that considering other review’s suggestions figure plots may have been adjusted or changed, resulting in adjustment of the explanations in the figure annotation.

      (5) How were recent and remote interspersed relative to one another? The text says that each run had 10 recent and 10 remote pairs, presented in a "pseudo-random order" - not clear what that (pseudo) means in this case. Please clarify.

      Thank you for raising this point. We provide this information in the Methods section “Materials and Procedure”: 'The jitters and the order of presentation for recent and remote items were determined using OptimizeXGUI (Spunt, 2016), following an exponential distribution (Dale, 1999). Ten unique recently learned pairs (from the same testing day) and ten unique remotely learned items (from Day 0) were distributed within each run (in total three runs) in the order as suggested by the software as the most optimal. There were three runs with unique sets of stimuli each resulting in thirty unique recent and thirty unique remote stimuli overall.'

      (6) Figure 1A, second to last screen on the learning cycles row - what would be presented to participants here, one of these three emojis? What does the sleepy face represent? I see some of these points were mentioned in the methods, but additional clarification in the caption would be helpful.

      Thank you for highlighting this. We have included this information in the figure caption. Specifically, the sleepy face symbol in the figure denotes a 'missed response'.

      (7) Not clear how the jittered fixation time between object presentation and scene test is dealt with in representational similarity analyses.

      Thank you for pointing this out. Beta estimates were obtained from a Least Square Separate (LSS) regression model. Each event was modeled with their respective onset and duration and, as such, one beta value was estimated per event (with the lags between events differing from trial to trial). We have edited the corresponding section (see p. 53).  

      (8) It was a little bit strange to have used anterior vs posterior HPC ROIs separately in univariate analysis but then combined them for multivariate. There are many empirical and theoretical motivations for looking at item-specific and category reinstatement in anterior and posterior HPC separately, so I was surprised not to see this. Please explain this reasoning.

      Thank you for pointing this out. We agree with the reviewer and included the anterior and posterior HC ROIs into the multivariate analysis. Please see the revised results section (pp. 13-15).

      (9) The term "neural specificity" is introduced (line 164) without explanation; please clarify.

      Thank you for bringing this to our attention. The term ‘neural specificity’ refers to the neural representational distinctiveness of information. In other words, ‘neural specificity,’ as defined by Fandakova et al. (2019), refers to the distinctiveness of neural representations in the regions that process that sensory input. We decided, however to refrain from using this term and instead to use neural representational distinctiveness, which is more self-explaining and was also introduced in the manuscript.

      (10) Age range is specified as 5-7 years initially (line 187) and then 6-7 years (line 188).

      We have corrected the age range in line 188 to '5 to 7 years.'

      Reviewer #2 (Public Reviews):

      Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children. Despite these strengths, there are quite a few important design and analytical choices that derail my enthusiasm for the paper. If the authors could address these concerns, this manuscript would provide a solid foundation to better understand memory consolidation in children.

      We thank the reviewer for both the positive and critical appraisal of our paper.

      Reviewer #2 (Recommendations For The Authors):

      (1) My greatest concern is the difference in memory accuracy that emerges as soon as immediate learning, which undermines the interpretation of any consolidation-related differences. This concern is two-fold. The authors utilize an adaptive learning approach in which participants learn to criteria or stop after 4 repetitions. This type of approach leads to children seeing the stimuli more often during learning compared to adults, which on its own could have consequences for consolidation-related neural markers. Specifically, within adults theoretical and empirical work this shows that repeating information can actually lead to more gist-like representations, which is the exact profile the children are showing. While there could be a strength to this approach because it allows for equivocal memory, the decision to stop repetitions before criteria means that memory performance is significantly lower in the children, which again could have consequences to consolidation-related neural markers. First, the authors do not show any of the learning-related data which would be critical to assess the impact of this design choice. Second, there are likely differences in memory strength at the delay, making it extremely difficult to determine if the neural markers reflect development, worse memory strength, or both. This issue is compounded by the use of a 3-AFC paradigm, wherein "correct responses" included in the analysis could contain a significant amount of guessing responses. I think a partial solution to this problem is to analyze the RT data and include them in the analyses or use a drift-diffusion modeling approach to get more precise estimates of memory strength to control for this feature. An alternative is to sub-select participants in each group to have a sample matched on performance (including # of repetitions) and re-run all the analyses in this sub-sample. Without addressing these concerns it is near impossible to interpret the presented data.

      Thank you for highlighting this point.

      Firstly, we believe that our approach, involving strategic and repeated learning coupled with feedback, enhances the formation of detailed contextual memories. The retrieval procedure also emphasized the need for detailed memory for location. These are critical differences in experimental procedure from previous studies, which enhanced the importance of detailed representations and likely reduced the likelihood of forming gist-like memories.

      Indeed, we ceased further learning after the fourth repetition. Extensive piloting, where we initially stopped after the seventh repetition, showed no improvement beyond the fourth repetition. In fact, performance tended to decline due to fatigue. Therefore, we limited the number of repetition cycles to the point where an improvement of performance was still feasible. Even though children exhibited lower final learning performance overall, we believe our procedure facilitated them to reach their maximal performance within the experimental setup.

      To address the reviewer’s concern, we included learning data to illustrate the progression of learning (see Fig. 1C, pp. 9-10 in Results).

      When interpreting the retention rates, it is important to note that we reported retention rates only for items that were correctly learned (100%) on day 0, day 1, and day 14. This approach meant that different participants had varying numbers of items learned correctly. However, this method enabled us to address our primary question: whether memory consolidation, based on all items initially encoded successfully, is comparably robust between the groups. To simultaneously examine the change in retention rate slopes over time for recent (30 minutes after learning), short delay (one night after) remote, and long delay (two weeks after) remote items, we conducted a separate analysis of retention rates for recent items on days 1 and 14. After observing no differences between sessions in both age groups, we combined the data for recent items. This allowed us to investigate how the slope of memory retention for initially correctly learned items (with a baseline of 100%) changes over time. We observed a significant interaction between item type (recent, short delay remote, long delay remote) and group. Analysis of this interaction revealed significantly less robust memory consolidation across all delay times for children compared to young adults. The figures have been adjusted accordingly to incorporate the baseline of 100% correct performance.

      Following your suggestion, we also employed the drift diffusion model approach to characterize memory strength, calculating drift rate, boundary and non-decision time parameters. We added the results to the Supplementary Materials (section S2.1, Figure S1).

      Generally, our findings indicate lower overall drift rate in children when considering all items that had to be learned. We also observed that adults show higher slope of decline in drift rate in short and long delay, which, however, are characterized still by higher memory strength compared to children. Both age groups required similar amount of evidence to make decision, which declined with delay. It may indicate an adaptation of weaker memory. Further, we observed lesser non-decision time in children compared to adults, potentially suggesting less error checking or less thorough processing and memory access through strategy in children.

      Overall, these results indicate weaker memory strength in children as a quantitative measure. It may nevertheless stem from qualitatively different memory representations that children form, as our RSA findings suggest. We believe that our neural effect reflects the effect of interest (i.e., worse memory due to lower memory strength in children). When controlled for, it will take away variance of interest in the neural data. Therefore, we will refrain from including memory strength into the model. However, we will include mean RT as the indicator of general response tendencies.

      Given that the paper is already very complex and long, we opted to add the diffusion model results to the Supplementary Materials (section S2.1, Fig. S1), while discussing the results in the discussion (p. 35).

      (2) More discussion of the behavioral task should be included in the results, in particular the nature of the adaptive learning paradigm including the behavioral results as well as the categorical nature of the memoranda. Without this information, it is difficult for the reader to understand what category-level versus item-level reinstatement reflects.

      Thank you for this valuable input. We have incorporated this information into the results section. Please refer to pp. 9-10, 12, 14, 21, 25-26 for the added details.

      (3) Some of the methods for the reinstatement analysis were unclear to me or warranted further adjustment. I believe the authors compared the scene against all other scenes. I believe it would be more appropriate to only compare this against scenes drawn from the same category as opposed to all scenes. Secondly, from my reading, it seems like the reinstatement was done during the scene presentation, rather than the object presentation in which they would retrieve the scene. I believe the reinstatement results would be much stronger if it was captured during the object presentation rather than the re-presentation of the scene. Or perhaps both sets of analyses should be included.

      We apologize for the confusion regarding the analysis method.

      During the review process we have improved the description of this analysis and hope it is easier to follow now. In short, we used both approaches (within and between categories) to suit different goals (I.e., measuring scene-reinstatement and gist-like reinstatement).

      Both types of reinstatement were assessed during the fixation cross to avoid confounds with the object itself being on the screen. We only used the scene window in one analysis (scene-reinstatement index) as a neural template to track its pre-activation during the fixation. So, as the reviewer suggests, our rationale is that the reinstatement indeed starts taking place at the short object presentation window, but importantly, extends to the fixation window. We added this clarifying information to the results section (see p. 21-27).

      (4) For the univariate results, it was unclear to me when reading the results whether they were focusing on the object presentation portion of the trial or the scene presentation portion of the trial. Again, I think the claims of reinstatement related activity would be stronger if they accounted for the object presentation period.

      Thank you for pointing this out. Indeed, the univariate results were based on the object presentation time window. We added this information to the results section (Fig. 3, pp. 14, 16).

      (5) Further, given the univariate differences shown across age groups, the authors should re-run all analyses for the RSA controlling for mean activation within the ROI.

      Thank you for highlighting this. We re-ran all analysis for the RSA controlling for the mean activation within the ROI. The results remained unchanged. We have added this information to the results section as well as in Table S8 and S11 in the Supplementary Materials for further details.

      (6) The authors should include explicit tests across groups for their brain-behavior analyses if they want to make any developmentally relevant interpretations of the data. Also, It would be helpful to include similar analyses to those using the univariate signals, and not just the RSA results.

      Following reviewer’s suggestion, we included brain-behavior analyses for univariate data as well as RSA data with explicit tests across groups. These can be found in the Results Section pp. 18-20, 28-32. Due to the interdependence of predefined ROIs and to avoid running a high number of correlation tests, we employed the partial least square correlation analysis for this purpose. This approach focuses on multivariate links between specified Regions of Interest (ROIs) and fluctuations in memory performance over short and long delays across different age cohorts. We argue that this multivariate strategy offers a more comprehensive understanding of the relationships between brain metrics across various ROIs and memory performance, given their mutual dependence and connectivity (refer to Genon et al. (2022) for similar discussions).

      (7) There could be dramatic differences in memory processing across 5-7 year olds. I know the sample is a little small for this, but I would like to see regressions done within the middle childhood group in addition to the across-group comparisons.

      We have included information detailing the relationship between memory retention rate and age within the child group (refer to p. 13). In the child group, both recent and short delay remote memory improved with age. However, the retention rate for long-delayed memory did not show a significant improvement with increasing age in children.

      (8) I am concerned that the authors used global-signal as a regressor in their first-level analyses, given that there could be large changes in the amount of univariate activation that occurs across groups. This approach can lead to false positives and negatives that obscure localized differences. The authors should remove this term, and perhaps use the mean sum of the white matter or CSF to achieve the noise regressor they wanted to include.

      We understand the reviewers' concerns. However, we believe that our approach is recommended for the pediatric population. Specifically, Graff et al., 2021, found that global signal regression is a highly efficacious denoising technique in their study of 4 to 8-year-old children. This technique was previously suggested for adults by Ciric et al., 2017, and the benefits in terms of motion and physiological noise removal outweigh the potential costs of removing some signal of interest, as indicated by Behzadi et al., 2007. Additionally, we incorporated the six anatomic component-based noise correction (CompCor) to account for WM and CSF signals, as recommended in the pediatric literature.

      (9) The authors discuss the relationship between hippocampal reactivation and worse memory through the lens of Schapiro et al., but a new paper by Tanriverdi et al came out in JOCN recently that is more similar to the authors' findings.

      Thank you for highlighting the recent paper by Tanriverdi et al. in JOCN, which aligns closely with our findings. We appreciate the suggestion and agree that exploring this alignment could further enrich our discussion on the relationship between hippocampal reactivation and memory retention. We incorporated this work in our revised manuscript .

      Minor Comments

      - I was surprised that the authors did not see any differences in univariate signals for memory retrieval as a function of development, as much of the prior work has shown differences (for example work by Tracy Riggins). I believe this contrast should be highlighted in the discussion.

      - Given the robust differences in sleep patterns across childhood and the role of sleep in systems consolidation framework, I think this feature should be highlighted in either the introduction or discussion.

      - Could the authors report on differences (or lack of differences) in head motion across the groups, and if they are different whether they could include them as a confounding variable.

      I believe we included six motion parameters and their derivatives into the model

      Thank you for your comments.

      First, prior works on univariate signals of memory retrieval focused mostly on remembered vs forgotten contrasts, while in our study we focused on remote vs recent in short and long delay only for correctly remembered items. This can partially explain the results. We highlighted this information in the discussion session.

      Second, we agree with the reviewer that sleep patterns across childhood should be addressed in the analysis. Therefore, we incorporated them in the discussion section.

      Third, indeed head motion were included in the analysis as confounding variables, as adding them is highly recommended for the developmental population (e.g., Graff et al. 2021). As an example, we observed higher framewise displacement in children compared to adults, t = -16(218), p <. 001, as well as in translational y, t = -2.33(288), p = .02.

      Reviewer #3 (Public Reviews):

      Summary:

      This study aimed to understand the neural correlates of memory recall over short (1-day) and long (14-days) intervals in children (5-7 years old) relative to young adults. The results show that children recall less than young adults and that this is accompanied by less activation (relative to young adults) in brain networks associated with memory retrieval.

      Strengths:

      This paper is one of few investigating long-term memory (multiple days) in a developmental population, an important gap in the field. Also, the authors apply a representational similarity analysis to understand how specific memories evolve over time. This analysis shows how the specificity of memories decreases over time in children relative to adults. This is an interesting finding.

      We thank the reviewer for the appraisal of our manuscript.

      Weaknesses:

      Overall, these results are consistent with what we already know: recall is worse in children relative to adults (e.g., Cycowicz et al., 2001) and children activate memory retrieval networks to a lesser extent than adults (Bauer et al, 2017).

      It seems that the reduced activation in memory recall networks is likely associated with less depth of memory encoding in children due to inattentiveness, reduced motivation, and documented differences in memory strategies. In regard to this, there was consideration of IQ, sex, and handedness but these were not included as covariates as they were not significant although I note p<.16 suggests there was some level of association nonetheless. Also, IQ is measured differently for the children and adults so it's not clear these can be directly contrasted. The authors suggest the instructed elaborative encoding strategy is effective for children and adults but the reference in support of this (Craik & Tulving, 1975) does not seem to support this point.

      Thank you for your review, and we appreciate your valuable feedback. Here are our responses and clarifications:

      Regarding the novelty of the results in terms of mentioned existent literature, we believe that in contrast to Cycowicz et al. (2001) and Bauer et al (2017), etc, we assess not only immediate memory after encoding with semantic judgement of abstract associations, but add to these findings investigating consolidation-related changes in complex associative and contextual information in much under investigated sample of 5-to-7-year-old preschoolers. With this we are able to infer also how neural representations of children change over time, providing invaluable insights into knowledge formation in this developmental cohort.

      With this, the observed age differences are not so of primary importance, as time-related changes in mnemonic representations observed in children.

      Regarding the assumption of inattentiveness in children, we want to emphasize that the experimenter was present throughout the learning process, closely supervising the children. We observed prompt responses to every trial in children and noted an increase in accuracy over the encoding-learning cycles, leading us to conclude that the children were indeed attentive to the task. The observed accuracy improvement across learning cycles  indicates increase in remembered information. Furthermore, we took measures to ensure their engagement, including extensive training in both verbal and computerized versions to ensure that they understood and actively created stories to support their learning.

      We collected motivation data after each task execution in children, and the results indicated that they scored high in motivation. Children not only completed the tasks but also expressed their willingness to participate in subsequent appointments, highlighting their active involvement in the study.

      The observed differences in the efficiency of strategy utilization were expected, given developmental differences in the associative and strategic components of memory in children, as noted in prior research (Shing, 2008, 2010).

      We appreciate your point about IQ, sex, and handedness. These variables were indeed included in the behavioral models, and mean brain activation was also included in the brain data models, addressing the potential influence of these factors on our results.

      While it's true that we applied different tests to measure IQ in children and adults, these tests targeted comparable subtests that addressed similar cognitive constructs. As the final IQ values are standardized, we believe it is appropriate to compare them between the two groups.

      Lastly, we agree that the citation Craik & Tulving, 1975 supports the notion of effectiveness of instructed elaborative learning only in adults, but not in children. For this purpose, we added relevant literature for the child cohort (i.e., Pressley, 1982; Pressley et al., 1981; Shing et al., 2008).

      Reviewer #3 (Recommendations For The Authors):

      An additional point for the authors to consider is that the hypotheses were uncertain. The first is that prefrontal, parietal, cerebellar, occipital, and PHG brain regions would have greater activation over time in adults and not children - which is very imprecise as this is basically the whole brain. Moreover, brain imaging data may be in opposition to this prediction: e.g., the hippocampus has a delayed maturational pattern beyond 5-yrs (e.ge., Canada 2019; Uematsu 2012) and some cortical data predicts earlier development in these regions.

      Thank you for your feedback, and we appreciate your insights regarding our hypotheses.

      The selection of our regions of interest (ROIs) was guided by prior literature that has demonstrated the interactive involvement of multiple brain areas in memory retrieval and consolidation processes. Additionally, our recent work utilizing multivariate partial least square correlation analysis (Schommartz, 2022, Developmental Cognitive Neuroscience) has indicated that unique profiles derived from the structural integrity of multiple brain regions are differentially related to short and long-delay memory consolidation.

      Indeed, the literature suggests that the hippocampus may exhibit a more delayed maturational pattern extending into adolescence, as supported by studies such as Canada (2019) and Uematsu (2012), etc. We added this information as well as findings from the literature on cortical development to be more balanced in our review of the literature.

      Given this complexity, we believe it is important to emphasize in our discussion that both the medial temporal lobe, including the hippocampus, and cortical structures, as well as the cerebellum, undergo profound neural maturation. We highlight these nuances in our revised manuscript to provide a more comprehensive perspective on the developmental differences in memory retention over time.

      The writing was challenging to follow - consider as an example on page 9 the sentence that spans 10 lines of text.

      Thank you for bringing this to our attention. We have carefully reviewed the manuscript and have made efforts to streamline the text, ensuring that sentences are not overly long or complex to improve readability and comprehension.

      I found the analysis (and accompanying figures) a bit of a data mine - there are so many results that are hard to digest and in other cases highly redundant one from the other. This may be resolved in part by moving redundant findings to the supplemental. Some were hard to follow - so when there is a line between recent and recent data, that seems confusing to connect data that, I believe, are different sets of items. Later scatterplots (Fig 7) have pale yellow dots that I had a hard time seeing.

      Thank you for bringing up your concerns regarding the analysis and figures in our manuscript. We have carefully considered your feedback and made several improvements to address these issues.

      To alleviate the challenge of digesting numerous results, we have taken steps to enhance clarity and reduce redundancy. Specifically, we have moved some of the redundant findings to the supplementary sections, which should help streamline the main manuscript and make it more reader friendly.

      Regarding the line between 'recent' and 'recent data,' figure were transformed to a clearer version. Furthermore, we have improved the visibility of certain elements, such as the pale-yellow dots in the scatterplots (Fig 1, 2, 4, etc. ), to ensure that readers can better discern the data points.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      […] 

      Weaknesses: 

      The question of the physiological relevance of short bouts of ischemia remains.

      The chemical ischemia protocol induces a duration-dependent ATP depletion in acute slices on a time scale of minutes (Pape and Rose 2023). This is about the same time scale as the peri-infarct depolarisation (Lauritzen et al. 2011) that the protocol attempts to model. Of course, such models do not completely replicate the complex situation in vivo. However, the presented analyses of synapse function cannot be performed in vivo. We discuss this now in the manuscript.

      The precise mechanisms underlying the shift between ischemia-induced long-term potentiation and long-term failure of synaptic responses were not addressed. Could this be cell death?

      Thank you for the comment. Yes, we indeed believe that the persistent failure of synaptic transmission is because of neuronal cell death (i.e., of CA1 pyramidal cells) or at least persistent depolarisation. We did not explicitly state that in the original submission but do so in the revised manuscript. It is supported by the unquantified observation of swelling and/or loss of integrity of CA1 pyramidal cell bodies in parallel to postsynaptic failure. It is also in line with many reports from the literature, of which we now cite two (lines 186-198).

      Sex differences are not addressed or considered.

      We have performed all experiments on male mice, as indicated in Material and Methods. We have indeed not addressed sex differences of the observed effects. We consider this, and many other important factors, to be interesting topics for follow-up studies. This is now discussed (lines 413-424).

      Reviewer #2 (Public Review): 

      […]

      Weaknesses: 

      The weaknesses are minor and only relate to the interpretation of some of the data regarding the presynaptic mechanisms causing the potentiation of release. The authors measured the fiber volley, which reflects the extracellular voltage of the compound action potential of the fiber bundle. The half-duration of the fiber volley was increased, which could be due to the action potential broadening of the individual axons but could also be due to differences in conduction velocity. We are therefore skeptical whether the conclusion of action broadening is justified.

      These are excellent points. We have added an analysis demonstrating that axonal conduction velocity is unlikely to be affected. Nonetheless, the fiber volley is indeed an indirect measure of what happens in individual axons. We have adjusted our interpretation accordingly and now also discuss alternative explanations of our findings (lines 363-379).

      Reviewer #3 (Public Review): 

      […]

      Weaknesses: 

      The data on fiber volley duration should be supported by more direct measurements to prove that chemical ischemia increases presynaptic Ca2+ influx due to a presynaptic broadening of action potentials. Given the influence that positioning of the stimulating and recording electrode can have on the fiber volley properties, I found this data insufficient to support the assumption of a relationship between increased iGluSnFR fluorescence, action potential broadening, and increased presynaptic Ca2+ levels.

      We have added a new analysis showing that the latency of the fiber volley is unaffected and relatively constant, which strengthens our conclusion. But the fiber volley is indeed an indirect measure of action potential firing in individual axons. The suggested experiment, which would require simultaneous recording of Ca2+ and action potentials in single axons in combination with chemical ischemia, is extremely difficult, if possible at all. Instead, we have extended the discussion and include now further alternative mechanistic explanations (lines 363-379).

      The results are obtained in an ex-vivo preparation, it would be interesting to assess if they could be replicated in vivo models of cerebral ischemia. 

      This would certainly be very interesting but also extremely challenging technically. For a detailed analysis of synaptic changes as presented here, the main difficulty will be to stimulate and visualise glutamate release exclusively in an isolated population of synapses while recording postsynaptic responses in a stroke model.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      […]

      Labelling of experimental groups of 2-minute and 5-minute chemical ischemia is more accurate than "metabolic stress" and "with postsynaptic failure". The critical difference between these two conditions is lost with this nomenclature. The reader could be misled to believe that the two groups form a heterogenous population of responses from the same experimental manipulation which is incorrect.

      We had stated in the manuscript that we ‘ … grouped combined iGluSnFR and electrophysiological recordings according to the effect of chemical ischemia on the synaptic response: ‘chemical ischemia with postsynaptic failure’ if the postsynaptic response did not recover to above 50% of the baseline level and ‘chemical ischemia’ when it did (as indicated in Fig. 1H). …’. The recordings were not grouped according to chemical stress duration but according to the effect on the postsynaptic response. We have revised the text explaining this (lines 125-135) and illustrate that now also in Fig. 1H. We hope this is easier to follow now.

      More details on the long-term impact of 5-minute ischemia on cell viability would be enlightening regarding the specific mechanism separating these two conditions. With 2 minutes it would appear that cells remain alive (i.e. intact post-synaptic responses), 5 minutes however, inducing cell death. 

      Yes, our observations, although not quantified, are in line with cell death as CA1 pyramidal cell bodies appeared swollen and/or lost their integrity when chemical ischemia was followed by postsynaptic failure. This is also in line with reports from the literature. We have revised the results section accordingly (lines 186-201).

      In the paragraph titled "glutamate uptake is unaffected after acute chemical ischemia", there are two erroneous citations of Figure S3 that should be Figure S4.

      Thank you. We corrected this mistake.

      The sex of animals is not given. This is essential information. 

      We used male mice as indicated in the initial version of the manuscript (Material and Methods). We have added a statement regarding the role of sex to the final section of the Discussion.

      Reviewer #2 (Recommendations For The Authors):

      We propose addressing the weaknesses mentioned in the public review. As said, the fibre volley is a very indirect measure of action potential broadening. Based on the iGluSnFR data, the authors predict that the potentiation is mediated by depolarization, action potential broadening, and increased presynaptic calcium influx. The latter could be tested experimentally, but this does not seem necessary if the data are interpreted more cautiously. For example, other explanations for the broadened fiber volley could be mentioned, such as a slowing and/or dispersion of the action potential propagation speed. Furthermore, depolarization could cause elevated resting calcium concentrations, which could potentiate release independently of action potential broadening. Finally, classical forms of presynaptic potentiation of the release machinery that occur during homeostatic plasticity or Hebbian plasticity may operate independently of calcium dynamics.

      Thank you for this comment. The discussion of the mechanism was indeed too short. We have added an analysis of the fiber volley delay after stimulation, which was not affected. Presynaptic action potential broadening is, in our opinion, a very likely explanation for our observations but we did not perform direct experiments. Directly recording presynaptic action potentials and Ca2+ transients in the chemical ischemia model over extended periods of time is a major technical challenge and certainly of interest in the future. As suggested, we have expanded the discussion section and now mention various alternative explanations (lines 363-379).

      There are the following minor suggestions:

      Add line numbers.

      We have added line numbers.

      We would suggest providing exact P values instead of asterisks in the figures. 

      We agree that having exact P values in the figure panels can be very helpful. However, in the present figures they are hard to integrate without overcrowding the already complex panels and thereby obscuring other important details. All p-values are included in the figure legends and/or main text.

      Abstract: "We also observed an unexpected hierarchy of vulnerability of the involved mechanisms and cell types." This sentence is hard to understand and cell types were not directly compared (i.e. axons of CA3 and axons of CA1 neurons were not compared).

      We have revised this statement and removed the reference to cell types.

      In Figure 1G there seems to be an increase in the fiber volley. Is this significant? Could this be due to swelling of the slice during chemical ischemia? Or an increase in excitability? Maybe this could be discussed. 

      The effect was analysed in the context of Fig. 2. A significant increase of the fiber volley amplitude was detected in chemical ischemia (Fig. 2H) but also under control conditions (Fig. 2F). We therefore consider this a change that is detectable but not related to chemical ischemia and not a potential explanation for increased glutamate release (lines 157-160). Also, no significant fiber volley increase was detected in chemical ischemia with postsynaptic failure (Fig. 2H) and in the experiments illustrated in Fig. 4E. Our interpretation is that the fiber volley unspecifically increases in some experiments over the time course of the experiment (~ 60 min) but this is unrelated to chemical ischemia.

      In the results: "A fully separate set of experiments..." Please explain better what this means. 

      We have revised the entire section to explain more clearly how recordings were grouped (lines 125135).

      In the results: "...(Syková and Nicholson, 2008) (Figure S3). However, this was not observed for chemical ischemia without postsynaptic failure (Figure S3), in which the increased glutamate transients were observed." This should probably refer to Figure S4. 

      Thank you for spotting this mistake. We corrected it.

      The last sentence in the results "... most likely by increased presynaptic Ca2+ influx, and, at the same time, the postsynaptic response." This is difficult to understand. Does "at the same time" refer to another mechanism or the consequence of more Ca2+? 

      We revised this part of the results section to improve clarity and toned down our conclusions (lines 328-335 and 363-379).

      Reviewer #3 (Recommendations For The Authors): 

      There are a few points that the author needs to clarify: 

      The authors do not discuss the different behaviour of iGlu F0 during chemical ischemia and chemical ischemia with postsynaptic failure shown in Figure 2, panels D and E. In the first case, during the application of the solution to induce ischemia, iGluF0 decreases while in the other case, it strongly increases before falling down. In both cases, the fEPSP slope is decreased. How does the author explain this observation? 

      We attribute the transient increase of extracellular glutamate during prolonged chemical ischemia to the increase of synaptic glutamate release observed previously under such conditions (Hershkowitz et al. 1993; Tanaka et al. 1997) and other mechanisms reviewed by us (Passlick et al. 2021) (e.g., glial glutamate release, transiently reduced glutamate uptake), which we could not detect during shorter chemical ischemia. The initial drop of the fEPSP slope is most likely due to postsynaptic depolarisation, which is followed by a repolarisation if the chemical stress duration is short. We now explain this in more detail in lines 185-200 of the revised manuscript. Although we focussed on the bi-directional effect on longer timescales in this manuscript, this transient phase during chemical ischemia is very interesting for further investigations.

      On page 8, first line, I think that the authors meant Figure S4, not Figure S3 when they mentioned results on ECS diffusivity and ECS fraction. 

      Yes, thank you for spotting this. We corrected the mistake.

      In Supplementary Figure 5 panel B It seems that PPR is significantly reduced upon chemical ischemia (asterisk on columns green) but the authors claimed in the paper at page 10 that "Analysing the paired-pulse ratio (PPR) of postsynaptic response and iGluSnFR transients revealed no consistent changes after chemical ischemia (Figure S5).". Did the authors refer to the data normalized in panel D? In this case, I do not see the need to normalize raw data that have been already shown in a previous panel and that give different statistical results, probably due to the different tests used (paired in panel B and not paired in panel D). 

      We have clarified this point in the supplementary material (Figure S5, legend). There is a relevant difference between the analyses presented in panel B and D. The paired test presented in B analyses the change of the electrophysiological PPR in response to chemical ischemia. The test in D on the electrophysiologically PPR asks if the reduction in B is significantly different from the changes seen under control conditions. Because it is not, we conclude that chemical ischemia has no relevant effect on the electrophysiological PPR and, in combination with the results on the iGluSnFR PPR, also not on short-term plasticity, as tested here.

      References

      Hershkowitz N, Katchman AN, Veregge S. Site of synaptic depression during hypoxia: a patch-clamp analysis. Journal of Neurophysiology 69: 432–441, 1993.

      Lauritzen M, Dreier JP, Fabricius M, Hartings JA, Graf R, Strong AJ. Clinical Relevance of Cortical Spreading Depression in Neurological Disorders: Migraine, Malignant Stroke, Subarachnoid and Intracranial Hemorrhage, and Traumatic Brain Injury. J Cereb Blood Flow Metab 31: 17–35, 2011.

      Pape N, Rose CR. Activation of TRPV4 channels promotes the loss of cellular ATP in organotypic slices of the mouse neocortex exposed to chemical ischemia. The Journal of Physiology 601: 2975–2990, 2023.

      Passlick S, Rose CR, Petzold GC, Henneberger C. Disruption of Glutamate Transport and Homeostasis by Acute Metabolic Stress. Front Cell Neurosci 15: 637784, 2021.

      Tanaka E, Yamamoto S, Kudo Y, Mihara S, Higashi H. Mechanisms Underlying the Rapid

      Depolarization Produced by Deprivation of Oxygen and Glucose in Rat Hippocampal CA1 Neurons In Vitro. Journal of Neurophysiology 78: 891–902, 1997.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Review:

      Reviewer #1 (Public Review):

      In 'Systems analysis of miR-199a/b-5p and multiple miR-199a/b-5p targets during chondrogenesis', Patel et al. present a variety of analyses using different methodologies to investigate the importance of two miRNAs in regulating gene expression in a cellular model of cartilage development. They first re-analysed existing data to identify these miRNAs as one of the most dynamic across a chondrogenesis development time course. Next, they manipulated the expression of these miRNAs and showed that this affected the expression of various marker genes as expected. An RNA-seq experiment on these manipulations identified putative mRNA targets of the miRNAs which were also supported by bioinformatics predictions. These top hits were validated experimentally and, finally, a kinetic model was developed to demonstrate the relationship between the miRNAs and mRNAs studied throughout the paper.

      I am convinced that the novel relationships reported here between miR-199a/b-5p and target genes FZD6, ITGA3, and CAV1 are likely to be genuine. It is important for researchers working on this system and related diseases to know all the miRNA/mRNA relationships but, as the authors have already published work studying the most dynamic miRNA (miR-140-5p) in this biological system I was not convinced that this study of the second miRNA in their list provided a conceptual advance on their previous work.

      We believe this study is an enhancement on our previous work for two reasons, which have been alluded to in new text within the introduction. Firstly, our previous work used experimental and bioinformatic analysis to identify microRNAs with significant regulatory roles during chondrogenesis. This new manuscript additionally uses  a systems biology approaches to identify novel miRNA-mRNA interactions and capture these within an in silico model. Secondly, this work was initiated by the analysis of our previously generated data – using a novel tool we developed for this type of data (Bioconductor - TimiRGeN).  

      I was also concerned with the lack of reporting of details of the manipulation experiments. The authors state that they have over-expressed miR-199a-5p (Figure 2A) and knocked down miR-199b-5p (Figure 2B) but they should have reported their proof that these experiments had worked as predicted, e.g. showing the qRT-PCR change in miRNA expression. Similarly, I was concerned that one miRNA was over-expressed while the other was knocked down - why did the authors not attempt to manipulate both miRNAs in both directions? Were they unable to achieve a significant change in miRNA expression or did these experiments not confirm the results reported in the manuscript?

      We agree with the reviewer that some additional data were needed to demonstrate the effective regulation of miR-199-5p.  Hence, Supplementary Figure 1 is now included which provides validation of the effects of miR-199a-5p overexpression

      (Supplementary Figure 1A) and inhibition of miR-199a/b-5p (Supplementary Figure 1B). Within the main manuscript, Figure 2B has been amended to include the consequences of inhibition of miR-199a-5p, with 2C showing the consequences of miR-199b-5p inhibition. Further, we include new data with regards to miR-199a/b-5p inhibition on CAV1 (Figure 4A). 

      I had a number of issues with the way in which some of the data was presented. Table 1 only reported whether a specific pathway was significant or not for a given differential expression analysis but this concealed the extent of this enrichment or the level of statistical significance reported. Could it be redrawn to more similarly match the format of Figure 3A? The various shades of grey in Figure 2 and Figure 4 made it impossible to discriminate between treatments and therefore identify whether these data supported the conclusions made in the text. It also appeared that the same results were reported in Figure 3B and 3C and, indeed, Figure 3B was not referred to in the main text. Perhaps this figure could be made more concise by removing one of these two sets of panels.

      We agree with all points made here and have amended these within the manuscript. Figure 1A is now pathway enrichment plots from the TimiRGeN R Bioconductor package, and the table which previously showed the pathways enriched at each time point is now in the supplementary materials (supp. Table 1). Figure 2 and 4 now have color instead of shades of grey. Figure 3C has now been moved to supplementary materials (Supplementary Figure 2) and is referenced in the text. 

      Overall, while I think that this is an interesting and valuable paper, I think its findings are relatively limited to those interested in the role of miRNAs in this specific biomedical context.

      Reviewer #2 (Public Review):

      Summary:

      This study represents an ambitious endeavor to comprehensively analyze the role of miR199a/b-5p and its networks in cartilage formation. By conducting experiments that go beyond in vitro MSC differentiation models, more robust conclusions can be achieved.

      Strengths:

      This research investigates the role of miR-199a/b-5p during chondrogenesis using bioinformatics and in vitro experimental systems. The significance of miRNAs in chondrogenesis and OA is crucial, warranting further research, and this study contributes novel insights.

      Weaknesses:

      While miR-140 and miR-455 are used as controls, these miRNAs have been demonstrated to be more relevant to Cartilage Homeostasis than chondrogenesis itself. Their deficiency has been genetically proven to induce Osteoarthritis in mice. Therefore, the results of this study should be considered in comparison with these existing findings.

      We agree with the reviewers comments. miR-455-null mice develop normally but miR-140-null (or mutated) mice and humans do have skeletal abnormalities (e.g. Nat Med. 2019 Apr;25(4):583-590. doi: 10.1038/s41591-019-0353-2), indicating a role in chondrogenesis.  We have made an addition in the description to point towards the need to assess the roles miR-199a/b-5p may play during skeletogenesis and OA. We anticipate miR-199a/b-5p to be relevant in OA and have ongoing additional work for this – but this beyond the scope of this manuscript. 

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Beyond the issues raised in the public review, I had a few minor recommendations that are largely designed to help improve the understanding of the manuscript as it is currently written.

      (1) Please provide the statistical tests used to obtain p-values in the Figure 2 and 4 legends.

      We have now added statistical test information to the figure legends of figures 2 and 4.

      (2) It is stated on p. 9 that both miRNAs may share a functional repertoire because 25 and 341 genes are interested between their inhibition experiments. Please provide statistical support that this overlap is an enrichment over the null background in this experiment. Total DE genes – chi squared. Expected / Observed. 

      A chi-squared test is now presented in the manuscript which shows that the number of significant genes which were found in common between miR-199a-5p knockdown and miR-199b-5p knockdown were significantly more than expected for day 0 or day 1 of the experiments. 

      (3) The final sentence on p. 12 (beginning 'Size of the points reflect...') seemed out of place - is it part of a legend?

      Thank you for pointing out this mistake - it was part of figure 3C and now is in the supplementary materials.

      (4) A sentence on p. 14 reads that 'FZD6 and ITGA3 levels increased significantly' but this should read decreased, rather than increased. Quite an important typo!

      Thank you for pointing this error out. It has been corrected.

      (5) Theoretical transcripts are mentioned in the legend of Figure 5A but these were not present in the figure. Please include these or remove them from the legend.

      This error has been removed form Figure 5A.

      (6) On p 20, the references 22 and 27 should I think be moved to earlier in the sentence (after 'miR-199a-5p-FZD6 has been predicted previously'). Currently, it reads as if these references support your luciferase assays which you claim are the first evidence for this target relationship.

      We agree with this change and have corrected the manuscript.

      (7) The reference to Figure 5D on p. 20 should be a reference to Figure 5C.

      Thank you for pointing this error out – this has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) The paper is based on the importance of miR-140 and miR-455 as miRNAs in chondrogenesis, citing only Barter, M. J. et al. Stem Cells 33, (2015). Considering the scope and results of this study, this citation is insufficient.

      We agree with this reviewers comments. For many year miR-140 and miR-455 have been experimented on and their importance in OA research has become apparent. We included additional references within the introduction to address this.

      (2) Analyzing chondrogenesis solely through differentiation experiments from MSCs is inadequate. It is essential to perform experiments involving the network within normal cartilage tissue and/or the generation of knockout mice to understand the precise role of miR199a/b-5p in chondrogenesis.

      We have added an additional paragraph in the discussion to state this, and do believe it is highly important that miR-199a/b-5p be tested in OA samples – however this would be beyond the intended scope of this article.

      (3) In light of the above points, it is imperative to investigate the role of miR-199a/b-5p beyond the in vitro differentiation model from MSCs, encompassing mouse OA models or human disease samples.

      In tangent with the previous address, we agree with the pretense and believe additional experiments should be performed to gain more insight to the mechanism of how miR-199a/b-5p regulate OA. But development of a new mouse line to investigate this is not in the scope of this manuscript.

    1. Author response:

      eLife assessment

      This important study describes the crystallographic screening of a number of small molecules against a viral enzyme critical for the 5' capping of SARS-CoV-2 RNA and viral replication. While the high-quality crystal structures and complementary biophysical assays in this study provide solid evidence to support the major claims regarding how these small molecule compounds bind to the viral enzyme, the mismatch between the antiviral activity and binding to the viral enzyme of several small molecule compounds could have been more thoroughly investigated or discussed. This paper would be of interest to the fields of coronavirus biology, structural biology, and drug discovery.

      We do fully agree that the antiviral assay results could be brought better into context clarifying that the antiviral effects of tubercine and its derivates are due to off-target effects.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript describes the crystallographic screening of a number of small molecules derived from the natural substrates S-adenosyl methionine (SAM) and adenine, against the SARS-CoV-2 2'-O-methyltransferase NSP16 in complex with its partner NSP10. High-quality structures for several of these are presented together with efforts to evaluate their potential biophysical binding and antiviral activities. The structures are of high quality and the data are well presented but do not yet show potency in biophysical binding. They only offer limited insights into the design of inhibitors of NSP16/10.

      Strengths:

      The main strengths of the study are the high quality of the structural data, and associated electron density maps making the structural data highly accurate and informative for future structure-based design. These results are clearly presented and communicated in the manuscript. Another strength is the authors' attempts to probe the binding of the identified fragments using biophysical assays. Although in general the outcome of these experiments shows negative data or very weak binding affinities the authors should be commended for attempting several techniques and showing the data clearly. This study is also useful as an example of the complexities associated with drug discovery on a bi-substrate target such as a methyltransferase, several of the observed binding poises were unexpected with compounds that are relatively similar to substrates binding in different parts of the active site or other unexpected orientations. This serves as an example of how experimental structural information is still of crucial importance to structure-based drug design. In general, the claims in the manuscript are well supported by the data.

      Weaknesses:

      The main limitations of the study are that the new structures generated in the study are fairly limited in terms of chemical space being similar to either SAM or RNA-CAP analogues. It feels a little bit of a lost opportunity to expand this to more diverse ligands which may reveal potential inhibitors that are distinct from current methyltransferase inhibitors based on SAM analogues and truly allow a selective targeting of this important target.

      It is true that it makes sense to screen for more diverse compounds to expand to a more diverse ligand set and we do hope our study motivates to do so. Given the limited number of crystal structures of nsp10-16 with potential drug molecules, the aim of this study was to upgrade the data base with new complex structures to have a pool of complex structures for future compound designs with increased selectivity. Furthermore, some of the hits are known inhibitors of similar enzymes and most prominent and potent methyltransferase inhibitors are structurally related to SAM, like sinefungin and tubercidine. We do think that knowing which SAM compounds or fragments of SAM are able to bind in the nsp10-16 active site is highly valuable for further specific and optimized inhibitor design.

      Another limitation is the potentially misleading nature of the antiviral assays. It is not possible to say if these compounds display on-target activity in these assays or even if the inhibition of NSP16/10 would have any effect in these assays. Whilst the authors do mention these points I think this should be emphasized more strongly.

      That is a very valid point and we do not believe that the antiviral activity is based on on-target effects. We do agree that the way it is currently presented can be considered misleading and we indeed clarify this point in the revised version.

      Minor critical points:

      The authors state that their crystals and protein preps have co-purified SAM occupying the active site of the crystals. Presumably, this complicates the interpretation of electron density maps as many of the ligands share overlap with the existing SAM density making traditional analysis of difference maps challenging. The authors did not utilize the PanDDA analysis for this step, perhaps this is related to the presence of SAM in the ground state datasets? Also, occupancies are reported in the manuscript in some cases to two significant figures, this seems to be an overestimation of the ability of refinement to determine occupancy based on density alone and the authors should clarify how these figures were reached.

      We have used PanDDA in parallel for hit finding. We however did not see any advantages for this target over the hit finding results from the visual inspection. This is probably as mentioned because of SAM being present is the “ground state” which complicates the PanDDA map calculations.

      Regarding the occupancies, we fully agree with this comment and change it to reasonable digits and clarify how the figures were reached.  

      The molecular docking approach to pre-selection of library compounds to soak did not appear to be successful. Could the authors make any observations about the compounds selected by docking or the docking approach used that may explain this?

      Yes, it is a good point to give possible explanations why the docking approach was not successful to facilitate similar approaches in future studies.

      Reviewer #2 (Public Review):

      Summary:

      The study by Kremling et al. describes a study of the nsp16-nsp10 methyl transferase from SARS CoV-2 protein which is aimed at identifying inhibitors by x-ray crystallography-based compound screening.<br /> A set of 234 compounds were screened resulting in a set of adenosine-containing compounds or analogues thereof that bind in the SAM site of nsp16-nsp10. The compound selection was mainly based on similarity to SAM and docking of commercially available libraries. The resulting structures are of good quality and clearly show the binding mode of the compounds. It is not surprising to find that these compounds bind in the SAM pocket since they are structurally very similar to portions of SAM. Nevertheless, the result is novel and may be inspirational for the future design of inhibitors. Following up on the crystallographic screen the identified compounds were tested for antiviral activity and binding to np16-nsp10. In addition, an analysis of similar binding sites was presented.

      Strengths:

      The crystallography is solid and the structures are of good quality. The compound binding constitutes a novel finding.

      Weaknesses:

      The major weakness is the mismatch between antiviral activity and binding to the target protein. Only one of the compounds could be demonstrated to bind to the nsp16-nsp10 protein. By performing a displacement experiment using ITC Sangivamycin is concluded to bind with a Kd > 1mM. However, the same compound displays antiviral activity with an EC50 of 0.01 microM. Even though the authors do not make specific claims that the antiviral effect is due to inhibition of nsp16-nsp10, it is implicit. If the data is included, it should state specifically that the effect is not likely due to nsp16-nsp10 inhibition.

      We do believe that the antiviral data are valuable and should be published within this work. We also agree with the comment that it should be clearly stated that the antiviral effect is not likely because of nsp10-16 inhibition and we will optimize that accordingly.

      The structure of the paper and the language needs quite a lot of work to bring it to the expected quality.

      We will go through the manuscript again and further improve the structure and language as much as possible

      Technical point:

      Refinement of crystallographic occupancies to single digit percentage is not normally supported by electron density.

      We agree with that point and correct it in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study defines a fundamental aspect of protein kinase signalling in the protist parasite Toxoplasma gondii that is required for acute and chronic infections. The authors provide compelling evidence for the role of SPARK/SPARKEL kinases in regulating cAMP/cGMP signalling, although evidence linking the loss of these kinases to changes in the phosphoproteome is incomplete. Overall, this study will be of great interest to those who study Toxoplasma and related apicomplexan parasites.

      We thank the reviewers for their thoughtful and positive evaluation of our work. Below, we have addressed all of the public reviews and recommendations for the authors in point-by-point responses. Additionally, we include with this resubmission RT-qPCR data where we observe no significant change in transcript levels for the relevant AGC kinases, supporting the hypothesis that SPARK/SPARKEL–regulation is post-translational.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Herneisen et al characterise the Toxoplasma PDK1 orthologue SPARK and an associated protein SPARKEL in controlling important fate decisions in Toxoplasma. Over recent years this group and others have characterised the role of cAMP and cGMP signalling in negatively and positively regulating egress, motility, and invasion, respectively. This manuscript furthers this work by showing that SPARK and SPARKEL likely act upstream, or at least control the levels of the cAMP and cGMP-dependent kinases PKA and PKG, respectively, thus controlling the transition of intracellular replicating parasites into extracellular motile forms (and back again).

      The authors use quantitative (phospho)proteomic techniques to elegantly demonstrate the upstream role of SPARK in controlling cAMP and cGMP pathways. They use sophisticated analysis techniques (at least for parasitology) to show the functional association between cGMP and cAMP signalling pathways. They therefore begin to unify our understanding of the complicated signalling pathways used by Toxoplasma to control key regulatory processes that control the activation and suppression of motility. The authors then use molecular and cellular assays on a range of generated transgenic lines to back up their observations made by quantitative proteomics that are clear in their design and approach.

      The authors then extend their work by showing that SPARK/SPARKEL also control PKAc3 function. PKAc3 has previously been shown to negatively regulate differentiation into bradyzoite forms and this work backs up and extends this finding to show that SPARK also controls this. The authors conclude that SPARK could act as a central node of regulation of the asexual stage, keeping parasites in their lytic cell growth and preventing differentiation. Whether this is true is beyond the scope of this paper and will have to be determined at a later date.

      Strengths:

      This is an exceptional body of work. It is elegantly performed, with state-of-the-art proteomic methodologies carefully being applied to Toxoplasma. Observations from the proteomic datasets are masterfully backed up with validation using quantitative molecular and cellular biology assays.

      The paper is carefully and concisely written and is not overreaching in its conclusions. This work and its analysis set a new benchmark for the use of proteomics and molecular genetics in apicomplexan parasites.

      Weaknesses:

      This reviewer did not identify any weaknesses.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Herneisen et al. examines the Toxoplasma SPARK kinase orthologous to mammalian PDK1 kinase. The extracellular signals trigger cascades of the second messengers and play a central role in the apicomplexan parasites' survival. In Toxoplasma, these cascades regulate active replication of the tachyzoites, which manifests as acute toxoplasmosis, or the development into drug-resilient bradyzoites characteristic of the chronic stage of the disease. This study focuses on the poorly understood signaling mechanisms acting upstream of such second messenger kinases as PKA and PKG. The authors showed that similar to PDK1, Toxoplasma SPARK appears to regulate several AGC kinases.

      Strengths:

      The study demonstrated a strong association of the SPARK kinase with an elongin-like SPARKEL factor and an uncharacterized AGC kinase. Using a set of standard assays, the authors determined the SPARK/SPARKEL role in parasite egress and invasion. Finally, the study presented evidence of the SPARK/SPARKEL involvement in the bradyzoite differentiation.

      Weaknesses:

      Although the study can potentially uncover essential sensing mechanisms operating in Toxoplasma, the evidence of the SPARK/SPARKEL mechanisms is weak. Specifically, due to incomplete data analysis, the SPARK/SPARKEL-dependent phosphoregulation of AGC kinases cannot be evaluated. The manuscript requires better organization and lacks guidance on the described experiments. Although the study is built on advanced genetics, at times, it is unnecessarily complicated, raising doubts rather than benefiting the study.

      The evidence for the SPARK/SPARKEL interaction is demonstrated through diverse experimental approaches that are internally consistent. Five separate mass spectrometry experiments, with replicates and appropriate controls, with tags on either SPARK or SPARKEL, showed that SPARK and SPARKEL form a strong interaction (Figure 1A, 1D, 1E; Figure 1—figure supplement 1). Global mass spectrometry experiments assessing the impact of  SPARK or SPARKEL depletion showed similar features (a reduction in PKG and PKA abundance and up-regulation of bradyzoite-associated proteins; Figure 3C–D). The phenotypes associated with SPARK and SPARKEL depletion phenocopy one another in all cell biological assays we tested (Figure 2A, 2D and PMID: 35484233; Figure 2E–J; Figure 4E–F; Figure 6A–B). Measuring the abundance of SPARK and SPARKEL in unenriched samples was challenging, but immunoblotting and proteomics suggest that depletion of one factor leads to down-regulation of the other (Figure 2B, 2C; Figure 3—figure supplement 1), which explains the genetic and cell biological phenocopying described above. We note that “further biochemical studies are required to discern the regulatory interactions between SPARK and SPARKEL” (first submission lines 590-591) and are beyond the scope of this work.

      The evidence for SPARK/SPARKEL regulation of AGC kinase activity is demonstrated through diverse experimental approaches that are also internally consistent. PKA C1 and PKG abundance levels decrease in parasites depleted of SPARK/SPARKEL, as measured by mass spectrometry (Figure 3A and 3C) and cell-based assays for PKA C1/R (Figure 4D–F). Comparisons of the global SPARK-, PKA R-, PKG-, and PKA C3-depleted phosphoproteomes suggest that PKA and PKG activity is reduced upon SPARK depletion whereas the activity of an unrelated factor (PP1) is unaffected (Figure 4G–H, Figure 4—figure supplement 1, Figure 5D–E, Figure 7I–J). Parasites depleted of SPARK are hypersensitized to a PKG inhibitor (Figure 5B–C). SPARK, PKA, and PKG are proximal in cellulo (Figure 3I) and SPARK co-purifies with PKA C3 (Figure 7A). The kinetic-phase phenotypes associated with SPARK and SPARKEL depletion (PMID: 32379047, Figure 2A, 2D–2J) are consistent with reduced PKG activity (PMID: 28465425) and only develop after PKG has been depleted as shown by proteomics experiments (Figure 2E-J and Figure 3C). Other studies have shown that the effects of reduced PKG activity are dominant to reduced PKA C1 activity (PMID: 29030485). The replicative-phase phenotypes associated with SPARK and SPARKEL depletion are consistent with reduced PKA C3 activity (PMID: 27247232 and herein). Mechanistically, PKG and PKA C1 activity must be lower in SPARK-depleted parasites because the abundances of these kinases are lower (Figure 3A, 3C). The mechanism of regulation may be more complex in the case of PKA C3, as SPARK depletion did not cause a reduction in PKA C3 abundance as measured by cellular assays (Figure 7B–F), but PKA C3 activity decreased (Figure 7I–K). We concede that multiple mechanisms may lead to the reduction in PKA C1 and PKG abundances, such as decreased activation loop phosphorylation and autophosphorylation at other stabilizing sites or enhanced ubiquitin ligase activity leading to active degradation of the kinases; we have moved speculation regarding such mechanisms to the Discussion.

      Although the reviewer commented that the manuscript “requires better organization” in the public review, no specific recommendations were provided to the authors. Therefore, we did not change the organization of the manuscript. We added an additional paragraph to the Discussion to reiterate key findings: “A prior study identified SPARK as a regulator of parasite invasion and egress following 24 hours of kinase depletion (Smith et al., 2022). Unexpectedly, we observed that three hours of SPARK or SPARKEL depletion were insufficient to impact T. gondii motility or calcium-dependent signaling, indicating that the phenotypes associated with SPARK and SPARKEL depletion develop over time. Quantitative proteomics revealed that PKA and PKG abundances began to decrease after more than three hours of SPARK depletion. Proximity labeling experiments also suggested that SPARK, PKA, and PKG are spatially associated within the parasite cell. We propose a model in which SPARK down-regulation coincides with reduced PKG and PKA activity due to diminished protein levels.” This work built upon genetic and proteomic approaches recently described by our group, which we cited in the text and extensive methods section. We added additional experimental detail where noted in the reviewer’s recommendations to the authors.

      The study utilizes advanced genetics because biochemical tools for eukaryotic parasites are limited. For example, no antibodies for T. gondii SPARK, PKA subunits, or PKG exist; to say nothing of phosphosite-specific antibodies, which are common in the mammalian cell signaling field. Therefore, to measure the relationship between SPARK, SPARKEL, and PKA subunits, we had to generate strains in which multiple proteins were tagged with epitopes for downstream analysis. The genetic experiments included appropriate controls and were internally consistent with results obtained using orthogonal approaches, such as mass spectrometry.

      Reviewer #3 (Public Review):

      Summary:

      This paper focuses on the roles of a toxoplasma protein (SPARKEL) with homology to an elongin C and the kinase SPARK that it interacts with. They demonstrate that the two proteins regulate the abundance of PKA and PKG, and that depletion of SPARKEL reduces invasion and egress (previously shown with SPARK), and that their loss also triggers spontaneous bradyzoite differentiation. The data are overall very convincing and will be of high interest to those who study Toxoplasma and related apicomplexan parasites.

      Strengths:

      The study is very well executed with appropriate controls. The manuscript is also very well and clearly written. Overall, the work clearly demonstrates that SPARK/SPARKEL regulate invasion and egress and that their loss triggers differentiation.

      Weaknesses:

      (1) The authors fail to discriminate between SPARK/SPARKEL acting as negative regulators of differentiation as a result of an active role in regulating stage-specific transcription/translation or as a consequence of a stress response activated when either is depleted

      We demonstrate a novel function for SPARK and SPARKEL as negative regulators of differentiation. The pathways leading to differentiation are being actively studied. Up-regulation of a positive transcriptional regulator of chronic differentiation, BFD1, is sufficient to trigger differentiation in vitro in the absence of other stressful growth conditions (PMID: 31955846). SPARK or SPARKEL depletion results in up-regulation of proteins that are up-regulated upon BFD1 overexpression. Whether BFD1 overexpression or SPARK and SPARKEL depletion triggers cellular stress pathways is beyond the scope of the current work, which focused instead on the immediate effect of these pathways on AGC kinases. Study of the effect of the various kinases on the parasite phosphoproteome shows that the putative targets of PKA C3 are specifically downregulated upon SPARK knockdown, indicating PKA C3 activity is indeed decreased in the latter condition.

      (2) The function of SPARKEL has not been addressed. In mammalian cells, Elongin C is part of an E3 ubiquitin ligase complex that regulates transcription and other processes. From what I can tell from the proteomic data, homologs of the Elongin B/C complex were not identified. This is an important issue as the authors find that PKG and PKA protein levels are reduced in the knockdown strains

      Our experiments suggest that SPARK and SPARKEL form a complex, and down-regulation of one complex member leads to down-regulation of the other. Thus in all tested assays, knockdown of SPARK and SPARKEL phenocopy one another. Further biochemical and structural work will be required to determine the mechanism by which SPARKEL regulates SPARK.

      Nearly all studies of the function of elongin C have been conducted in mammalian cells. Proteins with elongin C domains may serve alternative and unexplored functions in unicellular eukaryotes. We searched for the presence of Elongin A/B and known Elongin C complex members in the T. gondii genome and were unable to identify orthologs, explaining why these proteins were not identified in mass spectrometry experiments. Please see our response in Recommendations for the Authors, Reviewer 3 point 2.

      Beyond the concerns raised by the review team, we have identified and corrected the following errors or omissions in the first submission of the manuscript:

      - Line 176 of the first submission referred to a “peptide sequence match (PSM)”, which we have changed to “peptide-spectrum match”.

      - We recolored and relabeled the lines in Figure 5A so that it is easier to match a specific peptide with a specific line; and also corrected a mislabeling.

      - Figure 7B SPARK panel was incorrectly centered. The raw files can be viewed in Figure 7—source data 2.

      - Figure 7—figure supplement 1D was missing an x-axis label.

      - Line 1172 referred to “Supplementary File X”, which we corrected to “Supplementary File 3”.

      - We have updated references to preprints that have since been published, including PMID: 38093015, 37933960, 37966241, and 37610220.

      Editors comments:

      The proteomics data reported in this study underpin the major findings and are very comprehensive. As noted in the reviews, it is strongly recommended that the authors normalize the levels of detected phosphopeptides against the levels of the parent protein in the different mutant lines in order to identify changes in protein phosphorylation that are linked to protein kinase activity rather than protein degradation. A focus on changes that occur at early time points following protein knock-down may also help to identify the main targets of each kinase.

      Please see our response to Reviewer 2 Recommendations for the Authors, points 1 and 2.

      Reviewer #1 (Recommendations For The Authors):

      During my reading, I only found one small mistake. In Figure 7F, the x-axis is missing the word 'PKA'.

      We have updated the x-axis to read “SPARK-AID/PKA C3-mNG (h. + IAA)”.

      All information, code, and reagents are clearly explained.

      Reviewer #2 (Recommendations For The Authors):

      How the phosphoproteome was analyzed needs to be clarified. The normalization step, computing the ratio of the phosphopeptide to the protein (peptide) intensity, appears omitted. It is the most critical step of the analysis. The minor shifts between protein and phosphosite intensity seem negligible, as seen in Figure 4 AB. The significant changes can only be deduced by calculating this ratio. In the current state, the presented results are inconclusive. The manuscript contains overreaching and often unsupported statements because the data has not been appropriately filtered. Related to this topic, it is advisable to use well-accepted terminology and complete words when describing proteome and phosphoproteome. The interexchange of a "peptide" and a "phosphopeptide" in the text confuses and misleads.

      To clarify the phosphoproteome analysis:

      We cite a previous description of the phosphoproteomics sample preparation workflow (lines 1124-1125 of the first submission for example). Our quantitative phosphoproteomics experiments comprise two datasets generated from the same multiplexed samples. The samples were split at the point of phosphopeptide enrichment. Ninety-five percent of the samples were subjected to phosphopeptide enrichment (titanium dioxide followed by nickel affinity chromatography; “enriched samples”). Five percent of the samples were reserved as a reference for the non-enriched proteome (“non-enriched samples”). To clarify this point, we have added the sentences “Approximately 95% of the proteomics sample was used for phosphopeptide enrichment” and “The remaining 5% of the sample was not subjected to the phosphopeptide enrichment protocol” to the Methods sections, after describing the multiplexing steps.

      The samples were fractionated separately and run separately on an LC-MS system, which is described in the Methods section, for example lines 1130-1149 of the first submission. Raw files of the phosphopeptide-enriched and unenriched samples were analyzed separately, which is described in the Methods section, for example lines 1151-1158 of the first submission. To clarify this point, we have added the sentence “Raw files of the phosphopeptide-enriched and unenriched samples were analyzed separately” to the Methods sections. Many of the search parameters and descriptions of normalization and protein abundances were described in lines 1085-1093 of the first submission in reference to the 24h SPARK depletion proteome. We added this information to the description of the SPARK depletion time course phosphoproteome data analysis: “The allowed mass tolerance for precursor and fragment ions was 10 ppm and 0.02 Da, respectively. False discovery was assessed using Percolator with a concatenated target/decoy strategy using a strict FDR of 0.01, relaxed FDR of 0.05, and maximum Delta CN of 0.05. Only unique peptide quantification values were used. Co-isolation and signal-to-noise thresholds were set to 50% and 10, respectively. Normalization was performed according to total peptide amount. In the case of the unenriched samples, protein abundances were calculated from summation of non-phosphopeptide abundances.”

      We hope that this clarifies how the unenriched sample protein-level abundances were calculated. When we discuss “protein abundance”, we are referencing the unenriched sample summed non-phosphopeptide abundance. Our phosphoproteome analysis was based only on phosphopeptides, as our phosphopeptide enrichment resulted in 99% efficiency, and peptides lacking phosphorylation sites were filtered out before subsequent analyses. We used “peptide” and “phosphopeptide” interchangeably because the only peptide-level analysis performed was based on phosphopeptide abundances. We have changed any mention of “peptide” to “phosphopeptide” in the main text. 

      “The normalization step, computing the ratio of the phosphopeptide to the protein (peptide) intensity, appears omitted. It is the most critical step of the analysis.”:

      Unlike common differential gene expression analysis pipelines, proteomics analysis pipelines are not settled. Many analyses do not perform peptide-to-parent-protein corrections; some normalize phosphopeptide abundances to parent protein abundances calculated from summing non-phosphopeptides or a combination of phosphopeptide and non-phosphopeptides on an ad hoc basis; some calculate global normalization factors based on regressions of protein and phosphopeptide abundances or other pairwise comparisons. A caveat of protein normalization of phosphopeptides is that it over-corrects cases in which protein abundance and phosphorylation are interdependent, as is the case for auto-phosphorylation and some activation loop phosphorylations (PMID: 37394063). We used the approach that retained the greatest complexity of the data, which is to not normalize abundances across different mass spectrometry experiments and discard information that was not in the overlap. We have updated Supplementary File 3.3 to include protein-level quantification values (from Supplementary File 3.2) if measured.

      We clarified that the phosphopeptide abundances and protein-level abundances were derived from different datasets that were each internally normalized (globally centered by total peptide amount). Protein-level abundances were summed from non-phosphopeptide abundances. The calculated log2 changes are based on the globally centered data within each dataset. We analyzed the kinetic profiles of changing phosphopeptide abundances relative to a control using approaches similar to those described for several recent temporally resolved T. gondii phosphoproteomes (e.g. PMID: 37933960, 35976251, 36265000, 29141230) and as described in the Methods. The approach does not first correct for unenriched-sample parent protein abundance—in some applications, unenriched samples are not collected at all; instead, phosphopeptide ratios are median-normalized to non-phosphopeptide ratios (quantified due to inefficient phosphopeptide enrichment) and are individually tested against the null distribution of non-phosphopeptide ratios (e.g. PMID: 36265000, 29141230). We did not use this approach because our phosphopeptide enrichment was 99% efficient (18518 phosphopeptides of 18758 peptides with quantification values). In several cases using our approach, parent protein abundance is not quantified in the unenriched proteome dataset, but phosphopeptides are reliably quantified in the enriched proteome dataset. We note that phosphopeptide abundance changes can be difficult to interpret in such cases, e.g. in the first submission lines 178-186 and 193-194. We have added similar text to the results noting that in the case of PKA and PKG, both unenriched parent protein and enriched phosphopeptide abundances decreased (see below). We have also moved speculation about whether SPARK phosphorylates the activation loop of PKA and PKG, or whether the down-regulation of PKA and PKG arises from indirect effects, to the Discussion.

      We have moved comparisons of protein and phosphopeptide abundances from the Results to the Discussion. We added the following sentences to the result section Clustering of phosphopeptide kinetics identifies seven response signatures: “Because non-phosphopeptide and phosphopeptide abundances were quantified in different mass spectrometry experiments, it is challenging to compare the rates of phosphopeptide and parent protein abundance changes, especially when phosphorylation status and protein stability are interconnected. In general, both PKA C1, PKA R, and PKG protein and phosphosite abundances decreased following SPARK depletion (Figure 3—figure supplement 1), as discussed further below. We also observed down-regulation of phosphosite and protein abundances of a MIF4G domain protein.” Figure 3—figure supplement 1E is a new panel that shows PKA C1, PKA R, and PKG phosphopeptide and parent protein abundances along with global changes in phosphopeptide and parent protein abundances in the cases which both were quantified. We changed lines 278-282 in the first submission to “The SPARK depletion time course phosphoproteome showed a reduction in the abundance of PKA C1 T190 and T341, which are located in the activation loop and C-terminal tail, respectively (Figure 4A). Several phosphosites residing in the N terminus of PKA R (e.g. S17, S27, and S94) also decreased following SPARK depletion (Figure 4B).” We changed lines 313-315 in the first submission to “The SPARK depletion time course phosphoproteome showed a reduction in the abundance of several phosphosites residing in the N terminus of PKG as well as T838, which corresponds to the activation loop (Figure 5A). By contrast, S105 did not greatly decrease, and S40 abundance slightly increased.”

      The description of experiments should be more detailed. For example, the 3, 8, and 24 h treatments were used reversely; thus, they should be emphasized as time points before natural egress. Consequently, it seems that 3h treatment should be prioritized, given the SPARK/SPARKEL role in egress/invasion. Unexpectedly, the study draws more attention to a 24-hour treatment. If the AID-SPARK/SPARKEL is eliminated within 1h, parasites undoubtedly accumulate numerous secondary defects during a prolonged 23h deprivation. Since the SPARK pathway activates kinase/phosphatase cascades, the 24h data is likely overwhelmed with the consequences of the long-term complex degradation, making it a poor source of the putative SPARK substrates. Likewise, the downregulation of PKA observed in the 8 hours after SPARK depletion may be an indirect effect of the SPARK degradation. The direct effects and immediate substrates should be detectable within 2-3h of auxin treatment of the nearly egressing cultures.

      The first submission described how parasites were harvested at 32 hours post-infection with 0, 3, 8, or 24 hours of IAA treatment (lines 157-160, 1097-1110, and Figure 3B). To reiterate this experimental detail, we have added “harvested 32 hours post-infection” to the sentence “...quantitative proteomics with tandem mass tag multiplexing that included samples with 0, 3, 8, and 24 hours of SPARK or SPARKEL depletion” and similarly in the figure legend. The time points are unrelated to natural egress because the experiment was terminated at 32 hours post-infection, which is earlier than the window typically used to study natural egress under these conditions (40-48 hours post-infection). We chose to terminate the experiment before natural egress to better localize phosphopeptide changes related to SPARK depletion. The phosphoproteome undergoes dramatic reorganization during egress due to the activity of myriad kinases and phosphatases (see PMID: 35976251, 37933960, and 36265000), which would have likely complicated the signal.

      A pivotal result motivating time-course experiments and analysis was that SPARK/SPARKEL's role in egress and invasion emerges only after an extended depletion period (Figure 2E–J, first submission lines 126-145). The 24h depletion was used in the experimental system that first identified SPARK as a regulator of egress, which motivated our initial experiments, as stated in the first submission lines 126-144 and 149-151. We draw attention to the observation that SPARK and SPARKEL phenotypes develop over time in the first submission, lines 137-145. The role for SPARK/SPARKEL in egress/invasion does not manifest at 3h depletion; it manifests at 24h depletion. To ensure that this point is not overlooked by the reader, we have created a new heading in the Results section (SPARK and SPARKEL depletion phenotypes develop over time) for the paragraph that was previously lines 137-145. The remainder of the manuscript integrates data from proteomic, genetic, and cell-based assays across temporal dimensions to build a working model of how the phenotypes associated with SPARK depletion develop over time.

      Underpinning this comment is an assumption that phosphopeptides that decrease the most rapidly following a kinase’s depletion are direct substrates, whereas phosphopeptides that decrease with slower kinetics are not. This is not always the case. Consider a kinase that phosphorylates sites on substrate A and substrate B. The site on substrate A is also the target of a phosphatase, whereas the site on substrate B is recalcitrant to phosphatase activity. If the kinase were inhibited, then the site on substrate A would be actively dephosphorylated. As measured by a phosphoproteomics experiment, the abundance of the substrate A phosphopeptide would drop rapidly due to the inactivity of the kinase and activity of the phosphatase. In the text, we called such sites “constitutively regulated” or dynamic—they are actively dephosphorylated and phosphorylated within a short timeframe. The phosphosite on substrate B is comparatively static; once it is phosphorylated by the kinase, it is unaffected by subsequent inhibition of the kinase. Only newly synthesized substrate B molecules would be affected by kinase inhibition. As measured by a phosphoproteomics experiment, the abundance of the substrate B phosphopeptide would drop more gradually after kinase inhibition, as the unphosphorylated peptide is found only on newly synthesized proteins that were not previously exposed to kinase activity. An example of the scenario described for substrate A would be that of yeast Cdk1 T14/Y15, which is phosphorylated by Wee1 and dephosphorylated by Cdc25 (e.g. PMID: 7880537). An example of the scenario described for substrate B would be that of the human PKA C activation loop T197, which is phosphorylated by PDK1 and is phosphatase-resistant under physiological conditions (e.g. PMID: 22493239, 15533936).

      Both substrate A and B may be “direct” and functionally relevant targets of the kinase. Categorizing substrates as “immediate” is comparatively less informative in this context (although it may be relevant when studying fast, synchronized processes with high temporal resolution, such as induced Plasmodium spp. gametocyte activation or stimulation of T. gondii secretion). Furthermore, our earlier experiments had shown that the role for SPARK/SPARKEL in motility manifests after 3h depletion and is complete by 24h depletion. By this logic, we were most interested in the candidates showing differences at these time points. We conducted proximity labeling experiments to identify the overlap of proteins that exhibited SPARK-dependent decreases in the global proteomics and were also proximal to SPARK in space (first submission Figure 3I and lines 260-275), thus revealing a prioritized list of candidates, which included PKG and PKA. When technically feasible, we included a temporal dimension to follow-up experiments, rather than relying on a 24h terminal comparison (e.g. Figure 4E–H, Figure 5D–E, Figure 7D–F, Figure 7I–K; all first submission).

      Fig2 (B and C). What antibodies had been used to detect tagged proteins? There is a concern regarding the use of multiple tags attached to the same protein to the point that it doubles the size of the studied protein. The switch of the mobility of the SPARK and SPARKEL on the WB due to a change in MW adds to the confusion. Furthermore, the study did not use all the fused epitopes (e.g., HA). At the same time, the same V5 tag was used to detect two factors in the same parasite. Although the controls are provided, it does not eliminate the possibility that the second band on the WB results from one protein degradation rather than the presence of two individual proteins. Different tags should be used to confirm the co-expression of two proteins. Panel E is missing the X-axis label.

      Figure 2B was incorrectly labeled; the labels corresponding to SPARK and SPARKEL were switched. We corrected this error in the revised figures. The antibodies used were mouse monoclonal anti-V5 as described in the key resources table of the first submission. We added “V5” to Figure 2A and 2B. Regarding the effect of the tagging payload attached to the proteins, we have included in all assays a control relative to a parental strain (TIR1) without a tagging payload, and additionally included internal controls within tagged strains to calculate dependency of a phenotype on IAA treatment. The western blots in Figure 2B and 2C are from two different strains and experiments. The strains and experiments are described in the first submission main text (lines 113-124), the figure legend (lines 1847-1850), the key resources table, and the methods (lines 650-664, 872-891). A description of the SPARK-AID/SPARKEL-mNG strain was included in the key resources table but omitted in the methods. We therefore added the following section to the Methods:

      “SPARKEL-V5-mNG-Ty/SPARK-V5-mAID-HA/RHΔku80Δhxgprt/TIR1

      The HiT vector cutting unit gBlock for SPARKEL (P1) was cloned into the pALH193 HiT empty vector. The vector was linearized with BsaI and co-transfected with the pSS014 Cas9 expression plasmid into SPARK-V5-mAID-HA/RHΔku80Δhxgprt/TIR1 parasites. Clones were selected with 1 µM pyrimethamine and isolated via limiting dilution to generate the SPARKEL-V5-mNG-Ty/SPARK-V5-mAID-HA/RHΔku80Δhxgprt/TIR1 strain. Clones were verified by PCR amplification and sequencing of the junction between the 3′ end of SPARKEL (5’-GGGAGGCCACAACGGCGC-3’) and 5′ end of the protein tag (5’-gggggtcggtcatgttacgt-3’).”

      To clarify the expected MW of each species, we have added the following text to the Methods:

      “The expected molecular weight of SPARKEL-V5-HaloTag-mAID-Ty is 66 kDa, from the 42.7 kDa tagging payload and 23.3 kDa protein sequence. The expected molecular weight of SPARK-V5-mCherry-HA is 89.7 kDa, from the 31.9 kDa tagging payload and 57.8 kDa protein sequence. The expected molecular weight of SPARK-V5-mAID-HA is 71.3 kDa, from the 13.5 kDa tagging payload and 57.8 kDa protein sequence. The expected molecular weight of SPARKEL-V5-mNG-Ty is 55.2 kDa, from the 31.9 kDa tagging payload and 23.3 kDa protein sequence.”

      SPARK and SPARKEL are lowly expressed, which may have been compounded by basal degradation due to the AID tag (see for example Figure 3—figure supplement 1D of the first submission). We attempted several immunoblot conditions and antibodies, and only the V5 antibody proved effective in recognizing these proteins above the limit of detection. For this reason, we included an additional single-tagged control in each immunoblot experiment. Uncropped images of the blots are included in the first submission as Figure 2—figure supplement 1D and E and as Figure 2 source data. We added the following statement to the results section of the text:

      “However, SPARK and SPARKEL abundances are low and approach the limit of detection. We could only detect each protein by the V5 epitope. Although our experiments included single-tagged controls, we cannot formally eliminate the possibility that SPARK-AID yields degradation products that run at the expected molecular weight of SPARKEL. More sensitive methods, such as targeted mass spectrometry, may be required to measure the absolute abundance and stoichiometries of SPARK and SPARKEL.”

      We added “h +IAA” to the x-axis of panel 2E.

      Fig. 3. There is plentiful proteomic data on the factor-depleted parasites. Can it be used to confirm the co-degradation of the SPARK/SPARKEL complex components? This figure mainly includes quality control data that can be moved to Supplement. Did you detect SPARKEL in the TurboID experiment described in panel I? The plot shows only an AGC kinase.

      SPARK and SPARKEL are lowly expressed, and we often do not detect SPARK or SPARKEL peptides with quantification values in complex samples (such as global depletion proteomes and phosphoproteomes; IPs and streptavidin pull-downs are comparatively less complex, with IPs being the least complex samples). We discussed this caveat in the first submission lines 178-186. To additionally clarify this point, we have added “We were unable to measure SPARK or SPARKEL abundances in this proteome” earlier in the text.

      We consider the figure panels relevant to the discussion in the text.

      SPARKEL was not quantified in the SPARK-TurboID experiment (Supplementary File 2). We have added “SPARKEL was not quantified in this experiment” to the text. “Not quantified” is a different outcome from “quantified but not enriched”. The interaction between SPARK and SPARKEL is supported by five other independent interaction experiments in which SPARKEL was quantified (Figure 1A, 1D, 1E; and Figure 1—figure supplement 1). The added insight from the SPARK proximity labeling experiments comes from integration with the global proteomics, which suggests that AGC kinases are in proximity to SPARK and exhibit SPARK-dependent stability and hence activity. The logic of the proximity labeling experiment is described in lines 258-275 of the first submission.

      Fig. 6G is missing deltaBDF1 control for unbiased evaluation of the SPARK KD effect.

      The logic of this experiment was to evaluate whether excess differentiation caused by SPARK and PKA C3 depletion (Figure 6A and 6B) was dependent on the BFD1 circuit. The ∆bfd1 phenotype is well-established under these experimental conditions: parasites lacking BFD1 do not differentiate under spontaneous or alkaline conditions (e.g. PMID: 31955846, 37081202, 37770433). Parasites lacking BFD1 do not differentiate when SPARK and PKA C3 are depleted, suggesting that differentiation caused by SPARK or PKA C3 depletion occurs through the BFD1 circuit. If differentiation caused by SPARK or PKA C3 depletion did not depend on the BFD1 circuit, we might have observed differentiation in the SPARK- and PKA C3-AID/∆bfd1 mutants.

      To clarify this point, we have changed the first sentences of the last paragraph in the results section Depletion of SPARK, SPARKEL, or PKA C3 promotes chronic differentiation: “To assess whether excess differentiation caused by SPARK and PKA C3 depletion is dependent on a previously characterized transcriptional regulator of differentiation, BFD1 (Waldman et al., 2020), we knocked out the BFD1 CDS with a sortable dTomato cassette in the SPARK- and PKA C3-AID strains (Figure 6–figure supplement 1). The resulting SPARK- and PKA C3-AID/∆bfd1 mutants failed to undergo differentiation as measured by cyst wall staining (Figure 6G–H), suggesting that differentiation caused by depletion of these kinases depends on the BFD1 circuit.”

      Lines 239-242. The logic behind the categories of "constitutively regulated sites" and "newly synthesized proteins dependent on SPARK activation" is odd. The former (3h treatment) represents the SPARK-specific events (even though it should be shortened to 1-2h), while an 8h treatment is already contaminated with secondary effects. Since Toxoplasma divides asynchronously, the "newly synthesized" proteins will be present at the time. Also, the protein phosphorylation does not always lead to substrate activation; it can be repressive, too.

      We describe the logic in response to a comment above (substrate A vs. substrate B). It is correct that T. gondii divides asynchronously, with a cell cycle of approximately 8 hours, and 60% of parasites in G1 at a given time (PMID: 11420103). The proteomics experiments measure peptide and protein abundances at a population level. Newly synthesized proteins will be present at all time points; but the proportion of proteins synthesized after SPARK depletion relative to proteins synthesized before SPARK depletion will increase over time.

      We moved lines 238-243 from the first submission to the Discussion.

      It is accurate that phosphorylation does not always lead to substrate activation; it can also be repressive or not change substrate behavior. However, in the case of protein kinases, activation loop phosphorylation is highly correlated with activation (e.g. PMID: 15350212, 31521607).

      Line 250-252: Because the SPARK degradation did not affect intracellular replication, SPARK is unlikely to affect cell cycle-specific phosphorylation.

      To parallel the prior sentences describing different SPARK-dependent down-regulated clusters, we truncated this sentence to “The final cluster of depleted phosphopeptides, Cluster 4, only exhibits down-regulation at 8h of IAA treatment.”

      SPARKEL depletion did not significantly affect intracellular replication under the time frames investigated here (approximately 25 hours post-invasion; Figure 2D). A prior study reported that SPARK depletion did not affect intracellular replication measured on a similar timescale (PMID: 35484233).

      The opening sentence of the Discussion: Typically, we refer to the newly discovered proteins as the orthologs of the previously discovered counterparts and not the vice versa. Thus, calling Toxoplasma SPARK the ortholog of mammalian PDK1 would be more appropriate.

      We changed the opening sentence of the Discussion to “SPARK is an ortholog of PDK1, which is considered a key regulator of AGC kinases”.

      Reviewer #3 (Recommendations For The Authors):

      (1) Authors should show alignment of SPARKEL with Elongin C. Are key residues conserved?

      We have added an alignment of the SKP1/BTB/POZ domains of Homo sapiens elongin C, S. cerevisiae elongin C, and T. gondii SPARKEL as Figure 1—figure supplement 1B. This panel highlights elongin B interface, cullin binding sites, and target protein binding sites based on the human elongin C annotation. As discussed below, these interfaces may not be functionally conserved in T. gondii. Ultimately, future mechanistic and structural studies beyond the scope of the current work will be required to determine how SPARK and SPARKEL physically interact. The Discussion states, “further biochemical studies are required to discern the regulatory interactions between SPARK and SPARKEL” (lines 590-591).

      (2) The failure to identify other Elongin B/C complex members should be addressed by direct IP analysis.

      Indeed, elongin C has traditionally been characterized as a component of multisubunit complexes comprising Elongin A/B/C or Elongin BC/cullin/SOCS that regulate transcription or function as ubiquitin ligases, respectively (for a review, PMID: 22649776). We see two major issues when attempting to generalize these results to apicomplexan parasites. First, nearly all studies of the function of elongin C have been conducted in a single eukaryotic supergroup (the opisthokonts, including yeast and metazoans). The majority of eukaryotic diversity exists in other supergroups, including the SAR supergroup to which apicomplexans such as T. gondii belong (PMID: 31606140). Proteins with elongin C domains may serve alternative and unexplored functions in non-opisthokont unicellular eukaryotes. Second (in support of the first), we were unable to find orthologs of many of the opisthokont complex members in T. gondii, as systematically described below.

      By BLAST, the most similar protein to SPARKEL in S. cerevisiae is ELC1 (YPL046C), with a BLAST E = 0.003. The next most similar protein was SCF ubiquitin ligase subunit SKP1 (YDR328C) with an E value of 0.62. ELC1 is 99 amino acids. The Elongin C (IPR039948) and SKP1/BTB/POZ superfamily domains (IPR011333) span most of this sequence. SPARKEL is 216 amino acids; the Elongin C and  SKP1/BTB/POZ superfamily domains occupy the C-terminal half of the protein. The N-terminal domain of SPARKEL may be important for its function; however, future work is required to address this hypothesis.

      Elongin B: Elongin B is not found universally amongst even opisthokonts; fungi and choanoflagellates lack obvious orthologs. The most similar T. gondii protein to human Elongin B (Q15370) by BLAST is TGME49_223125 (E = 0.017), an apicoplast ubiquitin-like protein PUBL (PMID: 28655825, 33053376). TGME49_223125 has a C-terminal ubiquitin-like domain (IPR000626) but no ELOB domain (IPR039049); indeed, no T. gondii protein has an ELOB domain that can be identified by sequence searching. Given the lack of similarity between EloB and TGME49_223125, as well as this protein’s possible red algal endosymbiont origin, we consider it an unlikely ortholog of EloB and topologically unlikely to  interact with the SPARK/SPARKEL complex. We did not detect TGME49_223125 in SPARK or SPARKEL IPs (Supplementary File 1).

      Elongin A: T. gondii appears to lack a human elongin A ortholog (Q14241) on the basis of sequence similarity. The most similar T. gondii protein to yeast Elongin A (O59671) by BLAST is TGME49_299230 (E = 0.022). Yeast EloA is 263 amino acids. TGME49_299230 is 1101 amino acids and does not have an EloA domain (IPR010684), suggesting it is not a true EloA ortholog.

      Suppressor of cytokine signaling (SOCS): T. gondii appears to lack human SOCS1 or SOCS2 orthologs (O15524 and O14508) on the basis of sequence similarity. We were unable to identify T. gondii proteins with SOCS domains (PF07525, SM00253, SM00969, and SSF158235).

      Von Hippel-Lindau tumor suppressor (VHL): T. gondii appears to lack a human VHL ortholog (P40337) on the basis of sequence similarity.  We were unable to identify T. gondii proteins with VHL domains (IPR024048, IPR024053, PF01847, and SSF49468).

      Cul-2/5: Cullins appeared early in the eukaryotic radiation (PMID: 21554755), and thus T. gondii possesses several. Since the ELC complex has been best characterized with human cullin-2 (Q13617) and cullin-5 (Q93034), we searched for orthologs of these proteins and identified TGME49_289310, TGME49_289310, and TGME49_316660. TGME49_289310 functionally resembles cullin-1 of the SCF complex (PMID: 31348812). None of these proteins were enriched in the SPARK or SPARKEL IPs (Supplementary Table 1).

      Rbx1: We searched for human Rbx1 orthologs (P62877) and identified TGME49_213690, which functionally resembles Rbx1 of the SCF complex (PMID: 31348812); as well as several other RING proteins (TGME49_267520, TGME49_277740, TGME49_261990, and TGME49_232160) that were not found in the SPARK or SPARKEL IPs (Supplementary File 1).

      Rbx2: We searched for human Rbx2 orthologs (Q9UBF6) and identified several RING proteins (TGME49_285190, TGME49_254700, TGME49_292340, TGME49_226740, TGME49_244610, and TGME49_304460) that were not found in the SPARK or SPARKEL IPs (Supplementary File 1). No T. gondii protein has an Rbx2 domain (cd16466) that can be identified by sequence searching.

      In conclusion, we conducted “direct IP analysis” (Figure 1A, 1D; Figure 1-supplement 1A) of the SPARK and SPARKEL complex in the first submission of the manuscript. The observation that SPARK and SPARKEL form strong interactions was validated in cellulo via proximity labeling (Figure 1E; Figure 1-supplement 1B) in the first submission of the manuscript. These results are described together in the results section SPARK complexes with an elongin-like protein, SPARKEL (lines 75-110, first submission of manuscript). The failure to identify an interaction between SPARKEL and Elongin B/C complex members in T. gondii may be due to the observation that Elongin B and several ELC complex members do not exist in most eukaryotes, including T. gondii. We added the sentences “The function of proteins with Elongin C-like domains has not been widely investigated in unicellular eukaryotes” to the Results and “However, the SPARK and SPARKEL IPs and proximity experiments failed to identify obvious components of ubiquitin ligase complexes” to the Discussion.

      (3) PKA and PKG half-lives should be measured as well as their transcript abundances.

      The finding that PKA C1 and PKG protein abundances decreased upon SPARK/SPARKEL depletion was internally consistent across experiments. This down-regulation may be due to transcriptional, translational, or post-translational mechanisms. We measured PKG and PKA C1 transcript abundances in SPARK-AID and TIR1 parasites after 24 hours of IAA treatment using RT-qPCR. We did not detect significant differences in transcript levels of the queried kinases. These findings suggest that SPARK depletion leads to PKG and PKA down-regulation through post-transcriptional mechanisms. Translational control is normally enacted globally, for example through regulation of eukaryotic translation factors (PMID: 15459663). The rapid and specific down-regulation of PKG and PKA C1 would suggest that the kinase abundance levels are regulated by non-global translational mechanisms (e.g. mRNA-specific) or rather post-translational mechanisms.

      Substantial additional work is required to determine protein half-lives in eukaryotic parasites. In our discussion of possible mechanisms and models, we were agnostic as to the cause of reduced PKG and PKA abundances upon SPARK depletion. We note in the discussion, “The cause for reduction of PKA C1 and PKG levels requires further study” (lines 541-542).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2:

      (1) P-values should be reported adjusted for multiple tests or, at the very least, note that they are unadjusted to alert the reader that they may be biased by winner's curse.

      Throughout the manuscript, we applied the false discovery rate threshold to declare results that were statistically relevant for discussion. However, for reporting in abstract, we believe the raw p-values are most straightforward as we only reported the most important and robust results, and considering that 1) multiple testing correction does not change the ranking of the adjusted p-values; 2) p-value adjustment depends on both the method and the number of hypothesis tested; 3) all reporting of the most significant discovery results are prone to winner’s curse, but in the context of our study: the GFI1 finding was confirmatory in nature, thus raw p-value allows for a direct comparison with existing studies.

      We have taken the suggestion to quote the FDR-adjusted p-values throughout the manuscript for meta-analyzed results and discussed the impact of FDR correction for the EWAS and MRS association differed as a result of the number of hypothesis in each context:

      “For each EWAS or meta-analysis, the false discovery rate (FDR) adjustment was used to control multiple testing and we considered CpGs that passed an FDR-adjusted p-value < 0.05 to be relevant for maternal smoking.”

      “An FDR adjustment was used to control the multiple testing of meta-analyzed association between MRS and 25 (or 23, depending on the number of phenotypes available in the cohort) outcomes, and we considered association that passed an FDR-adjusted p-value < 0.05 to be relevant.”

      (2) The odds ratios and p-values reported in the abstract for associations of the MRS with smoking status and smoking exposure per week appear to be missing from the results section of the manuscript or (supplementary) tables.

      The results for smoking status during pregnancy was added to the results:

      “As a result, the epigenetic maternal smoking score was strongly associated with smoking status during pregnancy (OR=1.09 [1.07,1.10], p=5.5×10-33) in the combined European cohorts.”

      The exposure association was reported in the result section and Supplementary Table 8. We do note the typo in the cohort specific p-values, which now has been corrected.

      (3) It is misleading to report a lack of MRS associations with maternal smoking in South Asians without also stating that there were only two smokers.

      We agree with the reviewer that an association test would not be justified given the lack of smoking in the present South Asian cohort. We also removed the p-value of association for the START cohort in Figure 3, based on this and comment #4 from reviewer #3. The relevant results have been revised as follows:

      “The HM450 MRS was significantly associated with maternal smoking history in CHILD and FAMILY (n = 397), but we failed to meaningfully validate the association in START (n = 503; Figure 3) – not surprisingly – due to the low number of ever-smokers (n = 2).”

      (4) It is potentially confusing to report MRS associations with maternal smoking by ethnicity but then report associations with birth size and length combined without any explanation. The most novel result of this study is that there is virtually no maternal smoking among the South Asians and yet the MRS is associated with birth weight and size and with height at age 2. This result is buried in the combined analysis. I would suggest reporting the MRS associations with height and weight separately as has been done for maternal smoking behavior.

      We thank the reviewer for this suggestion and this has now been added the new Table 3, showing the cohort specific and meta-analyzed effect sizes. In the revision, we highlighted that the ethnic specific MRS associations, such as with smoking exposure at various age (1 and 3 years) and skinfold thickness in European cohorts but not the South Asian cohort, as well as associations that were more homogenous, such as the birth weight and unique body size association in combined cohorts. In particular, the MRS in the South Asian cohort exhibited a consistent association with body size at various time points (at birth, 1, 2, and 5 year) with similar effect sizes. The following was added to the results:

      “A higher maternal smoking MRS was significantly associated with smaller birth size (-0.37±0.12, p = 0.0023; Table 3) and height at 1, 2, and 5 year visits in the South Asian cohort (Table 3). We observed similar associations with body size in the white European cohorts (heterogeneity p-values> 0.2), collectively, the MRS was associated with a smaller birth size (-0.22±0.07, p=0.0016; FDR adjusted p = 0.019) in the combined European and South Asian cohorts (Table 3). Meanwhile, a higher maternal smoking MRS was also associated with a lower birth weight (-0.043±0.013, p = 0.001; FDR adjusted p = 0.011) in the combined sample, though the effect was weaker in START (-0.03±0.02; p = 0.094) as compared to the white European cohorts.

      The meta-analysis revealed no heterogeneity in the direction nor the effect size of associations for body size and weight between populations at birth or at later visits (heterogeneity p-values = 0.16–1; Supplementary Table 8).”

      Reviewer #3:

      (1) You mention that the 450K Score performs best even though only 10/143 are included for some populations. Did you explore recalibration of the MRS using only those 10 CpGs?

      We thank the reviewer for this comment – due to an error in result transferring, the number of overlapping CpGs between the 450K score and the targeted array was in fact 26. This error only impacted results relevant to the FAMILY study using the HM450K score and did not materially change our results nor conclusions. We have updated accordingly, Table 3, Suppl. Tables 5, 8, 9, Figure 3-B, and Suppl. Figures 5, 6-B), 7-B) and 7-D), and throughout the manuscript for meta-analyzed MRS associations.

      The subset of 26 CpGs using the originally derived weight was expected to perform worse than the original HM450K score using the full 143 CpGs. When we did restrict the methylation score construction to these 26 CpGs, the performance in CHILD was worse than the original score, but comparable to FAMILY (updated Suppl. Table 5). These 26 CpGs did overlap with the targeted score derived in CHILD (13 out of 15 present) and in FAMILY (19 out of 63 present), suggesting moderate agreement between the array platform as well as across studies.

      In other words, while the subset of 26 CpGs had reasonable performance in both CHILD and FAMILY, both studies could benefit by inclusion of the additional CpGs in the original score. We have included a sentence to discuss the choice of validation study and the trade-off between sample size and # of CpGs under response to Reviewer 3 comment # 2.

      (2) Could the internal validation performance be driven by sample size of the training, providing support for the need for larger training sizes? Should this be discussed in the study?

      The validation study, CHILD, has the smaller sample size between the two European cohorts. While both potential data for validation had smaller sample sizes, we chose CHILD (n=347), rather than FAMILY (n=397) as it had better coverage with respect to the discovery EWAS or the training data (# of associated CpGs = 3,092, n = 5,647). Beyond the signals of association, the validation performance also depends on a mix of overall sample size and the proportion of current smokers. Given the proportion of current smokers, the effective sample size for a direct comparison, i.e. equivalently-powdered sample size of a balanced (50% cases, 50% controls) design, are 41.7 and 104.7 for CHILD and FAMILY, respectively. While we are unable to directly compare whether a larger effective sample size produced a better performing score, we believe this to be the case, and thus a larger validation study would boost the performance of the methylation score. We have added the following to the discussion:

      “Given the proportion of current smokers, the effective sample size for a direct comparison between CHILD and FAMILY, i.e. equivalently-powdered sample size of a balanced (50% cases, 50% controls) design, were 41.7 and 104.7, respectively. While CHILD had a lower effective sample size, we ultimately chose it for validating the methylation score to better cover the CpGs that were significant in the discovery EWAS. A larger validation study will likely further boost the performance of the methylation score and be considered in future research.”

      (3) Figure 1: It is very helpful to have an overview diagram, but this should then follow the flow of the manuscript to aid the reader. Currently, the diagram does not follow the flow of the manuscript and thus is rather confusing - for instance, the figure starts with the MRS but initially an EWAS is conducted in the manuscript itself. I suggest to adapt the overview figure accordingly. Moreover, a description for (A), (B), (C) is not provided in the figure legends. Figure 1 could thus be improved further.

      We thank the reviewer for the suggestion to improve the key figure that summarizes the manuscript. The EWAS workflow for the primary, secondary and tertiary outcomes, as well as the European cohorts meta-analysis has been added to the updated sub-figure A). The description for each subfigures has also been added to the figure legends as follows:

      “Figure 1-A) shows the epigenome-wide association studies conducted in the European cohorts (CHILD and FAMILY); Figure 1-B) illustrated the workflow for methylation risk score (MRS) construction using an external EWAS (Joubert et al., 2016) as the discovery sample and CHILD study as the external validation study, while Figure 1-C) demonstrates the evaluation of the MRS in two independent cohorts of white European (i.e. FAMILY) and South Asian (i.e. START). The validated MRS was then tested for association with smoking specific, maternal, and children phenotypes in CHILD, FAMILY, and START, as shown in Figure 1-D).”

      (4) Figure 3: The readability and information content in this figure, and other figures containing boxplots (e.g., Supplementary Figure 5), could be improved. I would suggest to justify X axis labels to the axis rather than overlapping, and importantly, show individual data points wherever possible (e.g., overlaying the box plots). In c), the ANOVA is not justified given the sample size in START. In general, it is worth excluding the START cohorts from this analysis on the justification of a too small sample size for maternal smokers.

      We thank the reviewer for their thoughtful points for improvement. The axis labels have been wrapped to avoid overlapping, and the data points added to the boxplots. ANOVA p-value for START was removed due to the low counts of smokers in the figure and manuscript throughout. However, we retained START in Figure 3 and other boxplots to show the distribution of the score for non-smokers to benchmark with the European cohorts.

      (5) In addition to boxplots, it may be helpful to show AUC diagrams for ROC curves (e.g. Figure 3). AUCs are reported in the Tables but not shown. Additionally, all AUC results should include 95% Confidence intervals.

      This is a great suggestion and we have added the corresponding ROC, annotated with AUC (95% CI) to Figure 3. The 95% CI for all AUC results were added to the Tables and main text. The following was added to Methods:

      “The reported 95% confidence interval for each estimated AUC was derived using 2,000 bootstrap samples.”

      (6) Supplementary Figure 6: It could be helpful to discuss the amount of overlap between the different MRS.

      Most of the scores were derived using the Joubert et al., (2016) EWAS as the discovery sample, including ours, and thus there will be overlap between the scores. The exception was the GondaliaScore, which contained only 3 CpGs that do not overlap with any other scores.

      While different scores might not have selected completely identical sets of CpGs, the mapped genes are highly consistent across the scores. We have added to the discussion and results the extent of overlap between the top scores:

      “In particular, scores that were derived using the Joubert EWAS as the discovery sample, including ours, had higher pairwise correlation coefficients across the birth cohorts, with many of the CpGs mapping to the same genes, such as AHRR, MYO1G, GFI1, CYP1A1, and RUNX3.”

      (7) Supplementary Figure 7: This figure is never referenced in the text and from the legend itself it is not too clear what it is trying to show. Please refer to it in the main text with some additional context.

      Supplementary Figure 7 was referenced in the Results under subsection “Methylation Risk Score (MRS) Captures Maternal Smoking and Smoking Exposure”, following the<br /> Methods subsection “Statistical analysis” where we wanted to examine a systematic difference. We made revision to the main text to clarify the analysis:

      “For the derived MRS, we empirically assessed whether a systematic difference existed in the resulting score with respect to all other derived scores. This was examined via pairwise mean differences between the HM450 and other score using a two-sample t-test and an overall test of mean difference using an ANOVA F-test, among all samples and the subset of never smokers.”

      (8)   Tables: Tables are currently challenging to read and perhaps more formatting could be done to improve readability.

      We thank the reviewer for the suggestion. Main tables have been reformatted to a landscape layout and each numeric cell moved to the centre to improve readability.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      (1) In Figure 1, the authors show that TF3C binds to the amino terminus of MYCN (Myc box I region), as shown previously. The data in Figure 1 B-D support, but do not rigorously confirm a 'direct' interaction because it has not been ruled out that accessory proteins mediating the association may be present in the mixture.

      In Figure 1B-D we have purified MYCN and the TFIIIC/TauA complex separately and then mixed the purified preparations, demonstrating that the purified proteins interact. We have additionally performed mass spectrometry, which shows that the TauA/MYCN complex is formed without further accessory proteins, as the molecular weight would be higher. Based on the Coomassie stained SDS-PAGE gels, there is no plausible contaminating band in the purified complex that could be mediating the interaction between MYCN and TauA, either in the purified complex (Figure 1C), or in the purified protein used to reconstitute the complex (Figure S1A & S1B).

      (2) The authors indicate in Figure 2 that TF3C has essentially no effect on MYCNdependent gene expression and/or transcription elongation. Yet a previous study (PMID: 29262328) associated with several of the same authors concluded that TF3C positively affects transcription elongation. The authors make no attempt to reconcile these disparate results and need to clarify this point.

      We agree that the data in this manuscript do not support the role on transcription elongation. This point was also raised by Reviewer 3. Comparing our new results to the data published previously we can summarize that the data sets in the two studies show three key results: First, the traveling ratio of RNAPII changes upon induction of MYCN. Second, RNAPII decreases at the transcription start side and third, it increases towards the end side.

      We agree that in the previous study we linked the traveling ratio directly to elongation. However performing ChIP-seq with different RNAPII antibodies showed us that for example RNAPII (N20), which is unfortunately discontinued, gives different results compared to RNAPII (A10). Combining our new results using the RNAPII (8WG16) antibody shows that the traveling ratio is not only reflecting transcription elongation but also includes that the RNAPII is kicked-off chromatin at the start side.

      (3) Figures 2B and C show that unphosphorylated pol2 is TSS-centered, and Ser2-P pol2 occupation is centered beyond the TES. From this data, however, the reader can't tell how much of the phospho-Ser2- pol2 is centered on the TSS. The authors should include overall plots over TSS and TES, and also perhaps the gene-body to allow a better comparison for TSS and TES plotted for both antibodies over the collected gene sets.

      We focused on the TSS for unphosphorylated RNAPII and the TES for pSer2-RNAPII, as these are the regions with specific enrichment of the respective antibodies. As requested for comparison, we now include metagenes showing TSS, gene-body, and TES for both antibodies as new Figure S2A and B. Additionally, we included density plots for unphosphorylated RNAPII at the TES as well as for pSer2-RNAPII at the TSS as a Figure for the Reviewers (Figure 1).

      (4) The authors see more TF3C at promoters in cells with MYCN (Figure 2F). What are the levels of TF3C in the absence and presence of MYCN?

      As shown in the immunoblot in Figure S1E, TF3C5 levels do not change upon induction of MYCN. We therefore think that MYCN helps to recruit TFIIIC5 to RNAPII promoter sites. This is also in accordance to what we previously reported 1.

      (5) The finding that TF3C is increased at TSS (Figure 2F) doesn't necessarily indicate that 1) MYCN is recruiting TF3C there, and 2) that this is due to the phosphorylation status of pol2. It could mean many other things. The logic of conflating these 3 points based on the data shown is questionable.

      We showed previously that knock-down of MYCN affects TFIIIC5 binding, showing that MYCN is required for binding of TFIIIC5 at promoter sites 1.

      Additionally, we included data with DRB treated cells (Figure 2F), which prevents RNAPII loading by preventing downstream de novo elongation. Those data show that TFIIIC5 binding at the TSS is massively increased upon induction of MYCN and additionally upon treatment with DRB. Conversely, we observed that the major effect of TFIIIC knock-down was at the nonphosphorylated RNAPII at the TSS on MYCN induction (Figure 2B). Therefore, we would argue that our assumption fits well to the data presented in the manuscript.

      (6) Figure 3A doesn't add much to the paper, as it is overplotted and no relationship is clear, except that Pol2 and MYCN occupy many of the same sites. Perhaps a less complex or different type of plot would allow the interactions to be better visible.

      We agree with the comment and since in another comment we were asked to show the same window for all shown Hi-ChIP data plots, we changed Figure 3A.

      (7) That depletion of TF3C leads to increased promoter hubs may or may not have anything to do with its association with MYCN (Figure 4E). This could be a direct consequence of its known structural function in cohesin complexes, and the MYCN changes as a secondary consequence of this (also see point 4, above).

      As shown in Büchel et al. (2017) 1 MYCN is needed to recruit RAD21 and depletion of RAD21 has no impact on the recruitment of MYCN. Since RAD21 is part of the cohesin complex we would exclude that the MYCN changes are a secondary consequence.

      (8) Depletion of TF3C5 results in a loss of EXOSC5 (exosome) at TSS in the presence and absence of MYCN (Figure 5B). As TF3C5 is a cohesin, could this simply be a consequence of genomic structure changes?

      We agree that the discovered changes in EXOSC5 can be due to depletion of TFIIIC5. TFIIIC has been shown to recruit cohesin 1 and condensin complexes 2, as well as inducing chromatin architectural changes 3. However, MYCN is needed to recruit TFIIIC and depletion of TFIIIC had no impact on MYCN recruitment 1. Furthermore, MYCN has been shown to recruit exosome 4. Therefore, we would argue that either MYCN can directly play a role or thru chromatin architectural changes.

      (9) The authors suggest that RNA dynamics are affected by changes in exosome function (RNA degradation, etc). What effect, if any does TF3C depletion have on the overall gene expression profile?

      We show in the manuscript that TFIIIC depletion in unperturbed cells has no effect on the global gene expression profile in the time frame analyzed (Figure 2E and S2B).

      Reviewer #2 (Public Review):

      (1) Dynamic inferences are made without kinetic experiments.

      While we agree that we did not collect kinetic data to study the dynamics of RNA polymerase we would argue that the integration of our different data sets make it possible to draw conclusions about dynamic interferences. The transcription cycle and its sequential steps have been well described. In this sense, we use the non-phosphorylated RNAPII data that is situated between RNAPII recruitment and initiation and RNAPII-pSer2 that shows pause-release to elongation to draw conclusions on the dynamic. Likewise, we also made use of our previous published datasets.

      Reviewer #2 (Recommendations For The Authors):  

      (1) A number of changes are reported in hub size, expression, etc. upon treatment with tamoxifen to activate MCN-ER. But MYC is already present in the SHEP cells, so why doesn't MYC support these same phenomena? It would seem that either the ability to cooperate with TFIIIC to clear non-productive polymerase complexes from promoters is particular to MYCN, or else it reflects a quantitative increase in total MYC proteins due to the entry of MYCN-ER into the nucleus with tamoxifen. The authors should address or discuss this issue.

      It could be that protein levels are the limiting factor between MYC and MYCN observed effects in this system. This interpretation would be in accordance with the results of Lorenzin et al. 5, which reported that different levels of MYC had different targets based on the affinity to Eboxes and protein level. A similar profile of MYC levels compared to function was also reported regarding SPT5 6. Those high protein levels mimic what is found in certain tumors in contrast to physiological levels. In this sense, the observed differences can also be between physiological and oncological levels of MYC proteins.

      On the other hand, it has been described both a core MYC- and an isoform specific-signature of target genes. MYCN is described to be involved in gene expression during the S-phase of the cell cycle 7. This suggests that there are differences between MYC and MYCN other than gene sets. The interaction with TFIIIC appears to be one of these differences. We have found multiple TFIIIC subunits as part of the MYCN interactome, but the interaction of TFIIIC with MYC is weaker and we are uncertain how relevant it is 7,8. We show here that depletion of different subunits of the TFIIIC complex show a MYCN-dependent growth defect (Figure 1 E). Similarly, nuclear exosome is a MYCN-specific dependence 4, and we show here that MYCNdependent recruitment of the exosome requires TFIIIC5. We take this as an indication that there is an intrinsic difference between MYC and MYCN and that MYCN engages TFIIIC for this pathway.

      (2) Reciprocal to TFIIIC recruitment to MYCN- rRNA, and other RNAPIII genes. Does this happen targets would be MYCN association with tRNA genes, 5S, and if so, is this association TFIIIC dependent? What happens to the expression of these genes?

      We did observe MYCN in interactions involving tRNA and other RNAPIII sites, such as SINE elements and tRNAs (Figure 4B, 4D, S3F, and S4B). There was no relevant number of 5S rRNA involved in interactions – either because the difficulty to properly map these repetitive regions or due to biology. In any case, none of those regions appeared to be specifically dependent on TFIIIC as the overall number of interactions increased in TFIIIC depletion regardless of the genomic annotation (Figure S4B). Regarding the expression of RNAPIII genes, we are constrained by technical limitations of poly(A) enrichment RNA-seq to globally analyze it in an unbiased way. However, we addressed this point for tRNAs expression in an earlier work 1 and found that tRNA levels do not change upon TFIIIC depletion. We think this is because tRNAs are stable transcripts and RNAPIII recycling can occur in a TFIIICindependent manner 9. Conversely, we reported no significant expression changes in RNAPII genes upon TFIIIC depletion in this work.

      (3) The authors show that TFIIIC depletion does not alter the RNA-expression profile; how do they account for this? Can they comment on "background" transcription that it would seem should be suppressed by TFIIIC-dependent removal of various hypofunctional polymerases?

      Since TFIIIC is important for the removal of non-functional RNAPII we would not expect changes to the gene expression profile upon depletion of TFIIIC in the time frame analyzed. Monitoring the elongating form of RNAPII by measuring pSer2 indeed shows us that transcription elongation is not affected.

      (4) Global changes in expression are difficult to assess with DESEQ2. This hypernormalizing algorithm is not really suited to distinguish differential, but universal upregulation from some targets being truly upregulated while others are downregulated. The authors should comment.

      The authors acknowledge that DESEQ2 relies on the conjecture that genewise estimates of dispersion are generally unchanged among samples. We address this comment in two different ways. We include those in the Figure for the Reviewers (Figure 2). The first was to sequence samples deeper to avoid any bias created by random effect of lower coverage, the range of total reads increased from 6.8-9.3 to 16.5-20.7 million reads. The second was to compare the fold average bin dot plot for RNA-seq of SH-EP-MYCN-ER showing mRNA expression normalized by control per bin using the DESEQ2 (Figure 2A) normalization to TMM in edgeR (Figure 2B) and to quantile normalization (Figure 2C). No major differences were found from the original data or using the different methods, but we updated the Figure 2E in the manuscript to include the deeper sequencing dataset, we also adjusted it to show -/+ MYCN and transformed to log2 to make it more intuitive. Overall, it enhances our original understanding that gene expression remains largely unaffected by TFIIIC5 knockdown.

      (5) On page 7, the authors claim that MYCN-ER increased Ser-2 can reflect MYCN-stimulated transcription elongation. In fact, without kinetic studies, this is not fully supported. Accumulation of Ser-2 RNAPII along a gene can reflect increased initiation of full-speed RNAPs or a pile-up of RNAPs slowing down. This should be resolved or qualified.

      While we agree that we did not collect kinetic data to study the dynamics of RNA polymerase we would argue that the integration of our different data sets make it possible to draw conclusions about dynamic interferences. We showed on the one side that pSer-2 accumulates on the TES and on the other side the induction of MYCN-ER up-regulates gene expression which proves productive transcription elongation.

      (6) pLHiChIP needs to be better described, the Mumbach reference is not sufficient.

      We have reformulated the pLHiChIP in the method section and hope that this will provide now a better description of the method.

      (7) Can the authors recheck all the labels in Figure 2D-I believe there is an error involving + or - MYCN.

      We carefully rechecked all the labels in Figure 2 and it was correct as it was. We understand the confusion that may have created comparing Figure 2D and Figure 2E. To avoid confusion, we updated Figure 2E to show the same direction of Figure 2D. We also log2 transformed the y-axis of Figure 2E to foster a more intuitive reading.

      (8) Why are there different scales for the regions of chromosome 17 shown in Figures 3 and 4? It would be easier to compare if the examples were all shown at the same scale (about 2 MB is shown in another Figure).

      We now show the same region of chromosome 17 in Figure 3 and 4.

      Reviewer #3 (Public Review):

      (1) The connection between the three major findings presented in this study regarding the role of TFIIIC in the regulation of MYCN function remains unclear. Specifically, how the TFIIICdependent restriction of MYCN localization to promoter hubs enhances the association of factors involved in nascent RNA degradation to prevent the accumulation of inactive RNA polymerase II at promoters is not apparent. As they are currently presented, these findings appear as independent observations. Cross-comparison of the different datasets obtained may provide some insight into addressing this question.

      We previously observed that TFIIIC does not affect MYCN recruitment, while MYCN affects TFIIIC binding 1. Moreover, our group reported that MYCN recruits exosome 4 and BRCA1 to promoter-proximal regions 10 to clear out non-functional RNAPII. We are currently reporting that MYCN-TFIIIC complexes exclude non-functional RNAPII. However, MYCN-active promoter hubs have more RNAPII and more transcription than MYCN-active promoter outside hubs. Furthermore, TFIIIC binding occurs upstream of BRCA1 and exosome recruitments as depletion of TFIIIC leads to recruitment decrease of both factors. Therefore, we argue that TFIIIC is required for the proper function of those MYCN-active promoter hubs.

      (2) Another concern involves the disparities in RNA polymerase II ChIP-seq results between this study and earlier ones conducted by the same group. In Figure 2, the authors demonstrate that activation of MYCN results in a reduction of non-phosphorylated RNA polymerase II across all expressed genes. This discovery contradicts prior findings obtained using the same methodology, where it was concluded that the expression of MYCN had no significant effect on the chromatin association of hypo-phosphorylated RNA polymerase II (Buchel et al, 2017). In this regard, the choice of the 8WG16 antibody raises concern, as fluctuations in the signal may be attributed to changes in the phosphorylation levels of the Cterminal domain. It remains unclear why the authors decided against using antibodies targeting the N-terminal domain of RNA polymerase II, which are unaffected by phosphorylation and consistently demonstrated a significant signal reduction upon MYCN activation in their previous studies (Buchel et al, 2017) (Herold et al, 2019). Similarly, the authors previously proposed that depletion of TFIIIC5 abrogates the MYCN-dependent increase of Ser2phosphorylated RNA polymerase II (Buchel et al, 2017), whereas they now show that it has no obvious impact. These aspects need clarification.

      We politely disagree that our discoveries are contradicting each other. Comparing our new results to the data published previously we can summarize that the data sets in the two studies show three key results: First, the traveling ratio of RNAPII changes upon induction of MYCN. Second, RNAPII decreases at the transcription start side and third, it increases towards the end side.

      We agree that in the previous study we linked the traveling ratio directly to elongation. However performing ChIP-seq with different RNAPII antibodies showed us that for example RNAPII (N20), which is unfortunately discontinued, gives different results compared to RNAPII (A10). Combining our new results using the RNAPII (8WG16) antibody shows that the traveling ratio is not only reflecting transcription elongation but also includes that the RNAPII is kicked-off chromatin at the start side.

      In the previous study we only performed manual ChIP experiments for RNAPII (8WG16) and pSer2. Now we did a global analysis which is more meaningful and is also reflected in the RNA sequencing data.

      (3) Finally, the varied techniques employed to explore the role of TFIIIC in MYCNdependent recruitment of nascent RNA degradation factors make it challenging to draw definitive conclusions about which factor is affected and which one is not. While conducting ChIPseq experiments for all factors may be beyond the scope of this manuscript, incorporating proximity ligation assays (PLA) or ChIP-qPCR assays with each factor would have enabled a more direct and comprehensive comparison.

      We understand the criticism that we are comparing different assays. We have performed PLAs with different antibodies. Since the controls of the PLAs were not sufficient for us, we refrain from using them. ChIP-qPCR experiments are much more challenging to do side by side compared to PLAs, which is why we decided against looking at all factors with this method.

      Recommendations For The Authors:

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 2: Why did the authors choose the 8WG16 antibody? Does TFIIIC5 depletion suppress the MYCN-dependent reduction of total RNA polymerase II binding to promoters that they consistently showed in previous studies? Given that phosphorylation of the CTD impacts 8WG16 recognition, including Ser5-phosphorylated RNA polymerase II ChIPseq experiments might clarify this issue.

      We used the RNAPII (8WG16) antibody to exactly map non-phosphorylated RNAPII which shows us the binding of non-functional RNAPII.

      (2) Figures 3 and 4: As it stands, the manuscript does not convincingly establish a functional connection between the results in Figures 2, 3, and 4 or elucidate potential mechanisms. Are changes in RNA polymerase II levels upon MYCN activation more pronounced at promoters located at MYCN hubs? Do changes in MYCN-enriched chromatin contacts upon TFIIIC5 depletion somehow correlate with alterations in RNA polymerase II levels? Performing similar cross-comparisons as in Figure 3C may help address this issue. Furthermore, it not clear how the authors concluded that MYCN/TFIIIC5-bound genes are not part of these so-called promoter hubs.

      In Figure 3C we show that RNAPII levels are more pronounced upon MYCN activation at promoters located at MYCN hubs. Additionally, we show non-phosphorylated ChIP-seq on TSS and RNAPII-pSer2 ChIP-seq on TES density plots for promoters with MYCN interactions in the Figure for the Reviewers (Figure 3). We found no other difference than binding compared to the overall global analysis for all expressed genes showed in Figure 2B and Figure 2C. This goes on the same direction of the high expression observed of those genes in MYCN interactions observed in Figure 3C.

      The changes observed in Figures 2B and 2C are global and do include the promoters with MYCN interactions. At the same time, it is required a higher number of replicates to statistically distinguish the MYCN interaction differences between TFIIIC5 presence and depletion. We acknowledge this limitation, and we therefore restrain any attempt towards this end. We base our conclusions on the other parts of the manuscript and on our previous studies that show that MYCN recruits TFIIIC, BRCA1, and the exosome to promoter proximal regions 1,4,10.

      (3) Figure 5: According to the PLA results, activation of MYCN could enhance RNA polymerase II-NELFE interaction in a TFIIC5-dependent manner. Considering the raised issues regarding the use of the 8WG16 antibody, this result might be of relevance.

      Nevertheless, PLA does not seem to be the optimal technique to address these questions, and I would rather suggest performing ChIP-qPCR experiments for all the factors to be compared. Finally, do the authors conclude that the TFIIIC5 effect on MYCN-dependent changes in RNA polymerase II depends upon the recruitment of EXOSC5 and BRCA1? If so, it would be interesting to determine whether depletion of these factors phenocopies the effects observed with TFIIC5.

      We understand the criticism that we are comparing different assays. We have performed PLAs with different antibodies. Since the controls of the PLAs were not sufficient for us, we refrain from using them.

      (4) In Figure S2 the labels should be EtOH, 4-OHT, and Input.

      We changed this accordingly.

      (5) On page 7, the sentence "We have shown previously that TFIIIC5 depletion does not cause significant changes in expression of multiple tRNA genes that are transcribed by RNAPIII (Buchel et al., 2017)" appears to lack a connection.

      We agree with the reviewer and we deleted this sentence from the manuscript.

      Author response image 1.

      (A) Density plot of ChIP-Rx signal for non-phosphorylated RNAPII. Data show mean (line) ± standard error of the mean (SEM indicated by the shade) of different gene sets based on an RNA-seq of SH-EP-MYCN-ER cells ± 4-OHT. The y-axis shows the number of spike-in normalized reads and it is centered to the TES ± 2 kb. N = number of genes in the gene set defined in the methods. (B) Density plot of ChIP-Rx signal for RNAPII pSer2 as described for panel A. The signal is centered to the TSS ± 2 kb.

      Author response image 2.

      Bin dot plot for RNA-seq of SH-EP-MYCN-ER showing mRNA expression normalized by control per bin comparing the fold average using DESEQ2 (A), normalization to TMM in edgeR (B) and to quantile normalization (C).

      Author response image 3.

      Average density plot of ChIP-Rx signal for non-phosphorylated RNAPII (A) or RNAPII pSer2 (B) at promoters with MYCN interactions.

      References

      (1) Büchel, G., Carstensen, A., Mak, K.-Y., Roeschert, I., Leen, E., Sumara, O., Hofstetter, J., Herold, S., Kalb, J., and Baluapuri, A. (2017). Association with Aurora-A controls NMYC-dependent promoter escape and pause release of RNA polymerase II during the cell cycle. Cell reports 21, 3483-3497.

      (2) Yuen, K.C., Slaughter, B.D., and Gerton, J.L. (2017). Condensin II is anchored by TFIIIC and H3K4me3 in the mammalian genome and supports the expression of active dense gene clusters. Sci Adv 3, e1700191. 10.1126/sciadv.1700191.

      (3) Ferrari, R., de Llobet Cucalon, L.I., Di Vona, C., Le Dilly, F., Vidal, E., Lioutas, A., Oliete, J.Q., Jochem, L., Cutts, E., Dieci, G., et al. (2020). TFIIIC Binding to Alu Elements Controls Gene Expression via Chromatin Looping and Histone Acetylation. Mol Cell 77, 475-487 e411. 10.1016/j.molcel.2019.10.020.

      (4) Papadopoulos, D., Solvie, D., Baluapuri, A., Endres, T., Ha, S.A., Herold, S., Kalb, J., Giansanti, C., Schulein-Volk, C., Ade, C.P., et al. (2021). MYCN recruits the nuclear exosome complex to RNA polymerase II to prevent transcription-replication conflicts. Mol Cell. 10.1016/j.molcel.2021.11.002.

      (5) Lorenzin, F., Benary, U., Baluapuri, A., Walz, S., Jung, L.A., von Eyss, B., Kisker, C., Wolf, J., Eilers, M., and Wolf, E. (2016). Different promoter affinities account for specificity in MYC-dependent gene regulation. Elife 5. 10.7554/eLife.15161.

      (6) Baluapuri, A., Hofstetter, J., Dudvarski Stankovic, N., Endres, T., Bhandare, P., Vos, S.M., Adhikari, B., Schwarz, J.D., Narain, A., Vogt, M., et al. (2019). MYC Recruits SPT5 to RNA Polymerase II to Promote Processive Transcription Elongation. Mol Cell 74, 674-687 e611. 10.1016/j.molcel.2019.02.031.

      (7) Baluapuri, A., Wolf, E., and Eilers, M. (2020). Target gene-independent functions of MYC oncoproteins. Nat Rev Mol Cell Biol. 10.1038/s41580-020-0215-2.

      (8) Koch, H.B., Zhang, R., Verdoodt, B., Bailey, A., Zhang, C.D., Yates, J.R., 3rd, Menssen, A., and Hermeking, H. (2007). Large-scale identification of c-MYCassociated proteins using a combined TAP/MudPIT approach. Cell Cycle 6, 205-217. 10.4161/cc.6.2.3742.

      (9) Ferrari, R., Rivetti, C., Acker, J., and Dieci, G. (2004). Distinct roles of transcription factors TFIIIB and TFIIIC in RNA polymerase III transcription reinitiation. Proc Natl Acad Sci U S A 101, 13442-13447. 10.1073/pnas.0403851101.

      (10) Herold, S., Kalb, J., Büchel, G., Ade, C.P., Baluapuri, A., Xu, J., Koster, J., Solvie, D., Carstensen, A., and Klotz, C. (2019). Recruitment of BRCA1 limits MYCN-driven accumulation of stalled RNA polymerase. Nature 567, 545-549.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Data on SSCs are published from a previous report (Fig. 1C). These should be deleted or marked as such.

      We acknowledge the need for clarification regarding our study population for the germ cell stainings. As stated in our Materials and Methods section, our current study population includes the cohort from our previous publication (Vereecke et al., 2020), supplemented by nine additional participants, totaling n=106 trans women. Fig. 1C incorporates both previous and new data on germ cells, and this was further clarified in the Materials and Methods section.

      (2) Many micrographs are suboptimal and need to be replaced by better photos presenting cellular details more clearly. 

      The Figures were remade to solve the suboptimal resolution.

      (3) Table 2 would benefit from a column indicating the target cell or organelle.

      This column was added to Table 2.

      (4) The pubertal status is poorly defined by pre- and peripubertal terms. The authors should add more informative clinical scores. 

      We included information on the Tanner stages of the trans women in our cohort (all G5), as well as details on the selection criteria for our controls and their pubertal status.

      (5) The characterization of Leydig cells is incomplete. Several better markers would validate the findings. 

      As briefly touched upon in the discussion, the marker delta-like homolog 1 would indeed be valuable to assess the presence of truly immature Leydig cells. Unfortunately, our attempts to optimize the immunofluorescence protocol for this marker were unsuccessful, resulting in a double staining instead of a triple staining for the Leydig cells. This statement was also added to the Discussion.  

      (6) The selection bias for datasets is obvious. It seems that the authors try to create nice stories but do not always refer to less compelling datasets. Here a more critical view may be necessary to gain a more realistic view and may open alternative explanations. 

      We would appreciate clarification on which datasets may have been insufficiently reviewed and how our selection of highlights may have introduced bias to the interpretation and conclusion of the study. It is important to note that we did not select any patients/ data; all patient data were incorporated into our results section.

      (7) The term rejuvenation for the stem cell niche/germ cell complement is misleading in the title and text. Could the authors consider another team e.g... restoration., (de)differentiation. Alternatively, define the term juvenation in a more substantial manner. 

      We did not change the term “partial rejuvenation” as we believe it best describes our findings. We did however introduce the term in a more substantial manner in our Abstract and Discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors provided a lot of scattered data, but it would be useful to formulate clear criteria (hormonal therapy, age, end points, etc.) that the material must meet so that it can be used for research into prepubertal processes. 

      We have added these criteria to our Discussion. However, our current results do not yet reveal how these tissues behave in vitro. Ongoing research is addressing this question and will be presented in a future paper.

      (2) Is there any research on the preservation of functions of testicular cells from trans women?

      This data would be very useful, for example, for models for drug testing.  Yes: a reference to this paper was added to our Discussion.

      (3) It is recommended to present the data in a table reflecting the correlations found by the authors and the correlations from the literature between cellular changes and hormone levels and age. 

      After careful consideration, we have decided to proceed without incorporating these suggested changes. Our paper focuses on original findings rather than synthesizing existing literature. As such, we have chosen to emphasize our novel results and to compare them to the existing literature in the discussion section.

      (4) The authors can also provide data on clinical standards for hormone levels depending on gender and age. 

      This was added as Supplementary Tables 1-6.

      (5) It is recommended to add links to sources from which information about cellular prepubertal, pubertal and adult markers was taken. 

      This information was added throughout the manuscript.  

      (6) Is it known which cells within the wall of the seminiferous tubules in adults express AMH? Please clarify. 

      It has been shown that AMH receptor type 2 starts to be expressed in peritubular mesenchymal cells within the tubular walls during puberty and it remains so throughout adulthood (Sansone et al., 2020). AMH bound to this receptor may help explain the observed AMH signal in the tubular wall of peripubertal and adult controls. This information was added to our Discussion.

      (7) How was the degree of hyalinization assessed? It's not obvious from the pictures.

      This was further clarified in the Materials & Methods section.

      (8) Why were inhibin B and AMH not measured in all patients? 

      Inhibin B and AMH levels were not available for all patients due to the retrospective nature of these analyses. The measurements were not consistently recorded for all individuals within the historical dataset upon which our research relies.

      (9) Why does picture 3A present few SOX9 on adult Sertoli cells, although this is their typical marker?

      SOX9 was present in the adult Sertoli cells. However, this signal appears to be more "diluted" in adults due to their ongoing spermatogenesis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Ewing sarcoma is an aggressive pediatric cancer driven by the EWS-FLI oncogene. Ewing sarcoma cells are addicted to this chimeric transcription factor, which represents a strong therapeutic vulnerability. Unfortunately, targeting EWS-FLI has proven to be very difficult, and a better understanding of how this chimeric transcription factor works is critical to achieving this goal. Towards this perspective, the group had previously identified a DBD-𝛼𝛼4 helix (DBD) in FLI that appears to be necessary to mediate EWS-FLI transcriptomic activity. Here, the authors used multi-omic approaches, including CUT&tag, RNAseq, and MicroC to investigate the impact of this DBD domain. Importantly, these experiments were performed in the A673 Ewing sarcoma model where endogenous EWS-FLI was silenced, and EWS-FLI-DBD proficient or deficient isoforms were re-expressed (isogenic context). They found that the DBD domain is key to mediating EWS-FLI cis activity (at msat) and to generating the formation of specific TADs. Furthermore, cells expressing DBD-deficient EWS-FLI display very poor colony-forming capacity, highlighting that targeting this domain may lead to therapeutic perspectives.

      We thank Reviewer 1 for their strong summary of Ewing sarcoma background and accurate description of our experimental approaches and findings.

      Strengths:

      The group has strong expertise in Ewing sarcoma genetics and epigenetics and also in using and analyzing this model (Theisen et al., 2019; Boone et al., 2021; Showpnil et al., 2022).

      We thank the reviewer.  

      They aim at better understanding how EWS-FLI mediated its oncogenic activity, which is critical to eventually identifying novel therapies against this aggressive cancer.

      We are happy to see that our overall aim was also appreciated by Reviewer 1.

      They use the most recent state-of-the-art omics methods to investigate transcriptome, epigenetics, and genome conformation methods. In particular, Micro-C enables achieving up to 1kb resolved 3D chromatin structures, making it possible to investigate a large number of TADs and sub-TADs structures where EWS-FLI1 mediates its oncogenic activity.

      We thank Reviewer 1 for their acknowledgement of our approaches and the resolution achieved with our Micro-C experiments.  

      They performed all their experiments in an Ewing sarcoma genetic background (A673 cells) which circumvents bias from previously reported approaches when working in non-orthologous cell models using similar approaches.

      We agree with the reviewer about the importance of using model systems that accurately capture features of the disease being studied. As we have added an additional cell line in the revision we should note that this second model also represents a Ewing sarcoma genetic background while representing tumors expressing another oncogenic fusion found in this disease. 

      Weaknesses:

      The main weakness comes from the poor reproducibility of Micro-C data . Indeed, it appears that the distances/clustering observed between replicates are typically similar or even larger than between biological conditions. For instance, in Figure 1B, I do not see any clustering when considering DBD1, DBD2, DBD+1, DBD+2.

      Lanes 80-83: "KD replicates clustered together with DBD replicate 1 on both axes and with DBD replicate 2 on the y-axis. DBD+ replicates, on the other hand, clustered away from both KD and DBD replicates. These observations suggest that the global chromatin structure of DBD replicates is more similar to KD than DBD+ replicates."

      When replacing DBD replicate 1 with DBD replicate 2, their statement would not be true anymore.

      Additional replicates to clarify this aspect seem absolutely necessary since those data are paving the way for the entire manuscript.

      These are valid concerns and we thank the reviewers for highlighting this limitation of poor clustering of Micro-C replicates on MDS plot. We account for this variability between different replicates when identifying differentially interacting regions. By using an adjusted p-value < 0.05, we aim to ensure that repeating the experiments we will discover the same differentially interacting regions with a false discovery rate of 5%.

      We also would like to note that the replicates cluster much closely on PCA plot of RNA-seq data (Supplementary Figure 1C) and as well as on PCA plot of H3K27ac CUT&Tag data (Figure 4A). Notably, the RNA-seq result has now reproduced when performed with different sets of hands across multiple studies (Boone, et. al., 2021 and this report), as well as in a second cell line (as reported in this manuscript revision). These observations suggest that the cells of these replicates are functionally similar to each other at a population level. Chromatin organization detected by Micro-C is a highly heterogenous within cells of a population (Misteli, et. al., 2020). Moreover, despite increased resolution with Micro-C over Hi-C, the conventional sequencing depth that Micro-C is performed at makes resolving finer scale 3D interactions, particularly between enhancers and promoters, challenging (Goel, et. al., 2023). Thus biologically relevant interactions driving EWSR1::ETS transcriptional regulation through de novo enhancers may have relatively weak signal in Micro-C. Both the strength of the signal and the heterogeneous chromatin state present in bulk samples could affect the average signal leading to poor clustering replicates (Hafner and Boettiger, 2022). 

      Importantly, rather than add an additional replicate of a single cell line, we repeated our study in an additional cell line, TTC466, and largely reproduced our high-level findings for transcription, enhancer formation, and 3D chromatin. Specific limitations of the TTC466 study are addressed in the Discussion section (392-420). The reproduction of weak/moderate clustering in the MDS plot in both A673 and TTC466 cell lines suggests the α4 helix of EWSR1::ETS fusions are important for reshaping 3D chromatin. However, higher resolution analyses focused on specific EWSR1::ETS-bound loci are likely an important area of future study required to fully understand the role of the α4 helix in chromatin regulation in Ewing sarcoma.

      Similarly:

      - In Figure 1C, how would the result look when comparing DBD2/KD2/DBD+2? Same when comparing DBD 1 with KD1 and DBD+1. Would the difference go in the same direction?

      This is a great point. We added distance decay plots of individual replicates in Supplementary Figure 2 and added discussion of these results in lines 88-89 of the text.

      - Figure 1D-E. How would these plots look like when comparing each replicate to each other's? How much difference would be observed when comparing, for instance, DBD1/DBD2 ? or DBD1/DBD+1?

      Unfortunately, separate replicates are required to conduct Differentially Interacting Region analysis as it determines statistically significant interactions. Therefore, we are unable to plot these analyses with individual replicates. 

      - Figure 2: again, how would these analyses look like when performing the analysis with only DBD1/DBD+1/KD1 or DBD2/DBD+2/KD?

      This is a good suggestion. It is possible to do such analysis. However, we will lose resolution as such that we may not accurately detect TADs, especially smaller TADs. Therefore, we decided to combine the biological replicates.   

      Another major question is the stability of EWS-FLI DBD vs EWS-FLI DBD+ proteins. In the WB, FLAG intensities seem also higher (2/3 replicates) in DBD+ condition compared to the DBD condition (Figure S1B).

      This is a valid concern with shRNA knock-down/rescue system and we regularly validate new constructs to ensure that they have similar expression levels as rescue with the wildtype fusion before proceeding to more exhaustive experimental workups. We would note that while we have not tested for differences in protein stability, for these constructs we largely see similar expression levels across multiple experiments, multiple cell lines, and multiple sets of hands. There may be some variations in expression level from experiment to experiment, but western blotting is a semiquantitative assay and it is also not possible to rule out that slight differences in band intensity may be a result of error in gel loading. For this reason, alongside western blotting for construct expression, we also validate construct function using RNA-seq and colony formation assays (as reported in this manuscript) and these show good agreement across biological replicates.  

      Indeed, it seems that they have more FLAG (i.e., EWS-FLI) peaks in the DBD+ condition compared to the DBD condition (Figure 2B). 

      We appreciate the comment since the legend of Figure 2B led to a misunderstanding. Figure 2B depicts the number of TADs detected in DBD and DBD+ conditions (height of the bar graphs) and the proportion of those TADs overlapped with FLAG, CTCF, both or neither peaks on y-axis. The number of FLAG peaks is actually lower in DBD+ as compared to DBD as shown in Figure 5A-B.  We clarified our Figure 2 legend to accurately describe the various proportions (color coded section) of TADs bound by DBD/DBD+ FLAG and CTCF.

      Would it be possible that DBD+ is just more expressed or more stable than DBD? The higher stability of the re-expressed DBD+ could also partially explain their results independently of the 3D conformational change. In other words, can they exclude that DBD+ and DBD binding are not related to their respective protein stability or their global re-expression levels?

      It is possible that DBD+ protein is overexpressed or more stable than DBD. With our current set of data, we cannot conclusively exclude if binding by DBD and DBD+ are not related to their expression level or stability. We would note, as above, that western blots, RNA-seq, and agar assays have largely reproduced across experiments, hands, and cell lines and that western blot is an imperfect assay for assessing protein stability.

      Surprisingly, WB FLI bands in DBD+ conditions are systematically (3/3 replicates) fainter than in DBD conditions (Figure S1B). How do the authors explain these opposite results between FLI and FALG in the WB?

      This is an excellent observation that highlights one of the intricacies of studying EWSR1::FLI1 in our KD/rescue system. Often the limiting factor for an experiment is whether or not the KD condition maintains KD through a second viral transduction for rescue and selection. We have observed over many years of working with this system that rescue conditions which are fully functional (i.e. wildtype EWSR1::FLI1, DBD+, etc.) tend to maintain better KD of endogenous EWSR1::FLI1. Constructs that don’t rescue EWSR1::FLI1 function sometimes maintain KD to a lesser degree, though frequently to a functional degree (i.e. cells are not transformed and EWSR1::FLI1 transcriptional regulation is not rescued). We suspect this observation, also raised by Reviewer 1 is resulted from a potential selection of cells with more endogenous EWSR1::FLI1 escaping KD in in DBD conditions due to selective pressures during expansion in tissue culture.

      We should note that the antibody used for detecting FLI recognizes residues that are deleted in

      DBD and DBD+ constructs, such that the FLI1 blot in Supplementary Figure 1B does not detect either construct. It only detects endogenous EWSR1::FLI1 and the 3X-FLAG-EWSR1::FLI1 construct in the middle lane that runs at a slightly higher molecular weight. The FLAG antibody is the only antibody that detects all three rescue constructs.    

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Bayanjargal et al. entitled "The DBD-alpha4 helix of EWS::FLI is required for GGAA microsatellite binding that underlies genome regulation in Ewing sarcoma" reports on the critical role of a small alpha helix in the DNA binding domain (DBD) of the FLI1 portion of EWS::FLI1 that is critical for binding to repetitive stretches of GGAA-motifs, i.e. GGAA microsatellites, which serve as potent neoenhancers in Ewing sarcoma.

      We thank Reviewer 2 for their succinct and accurate summary of our manuscript. 

      Strengths:

      The paper is generally well-written, and easy to follow and the data presented are of high quality, welldescribed and underpin the conclusions of the authors. The report sheds new light on how EWS::FLI1 mechanistically binds to and activates GGAA microsatellite enhancers, which is of importance to the field.

      We appreciate the reviewer’s assessment of our work. 

      Weaknesses:

      While there are no major weaknesses in this paper, there are a few minor issues that the authors may wish to address before publication:

      (1) While the official protein symbol for the gene EWSR1 is indeed EWS, the protein symbol for the gene FLI1 is identical, i.e. FLI1. The authors nominate the fusion oncoprotein EWS::FLI1 (even in the title) but it appears more adequate to use EWS::FLI1.

      We appreciate the reviewer for bringing this to our attention. Indeed, the most recent guideline for fusion proteins nomenclature is to use the full gene symbols separated by double colons. Therefore, the accurate nomenclature is EWSR1::FLI1. We replaced instances of EWS::FLI with EWSR1::FLI1 and have used the EWSR1::ERG nomenclature in our revised manuscript.  

      (2) The used cell lines should be spelled according to their official nomenclature (e.g. A-673 instead of A673).

      Corrected, thanks!

      (3) It appears as if the vast majority of results were generated in a single Ewing sarcoma cell line (A-673) which is an atypical Ewing sarcoma cell line harboring an activating BRAF mutation and may be genomically quite unstable as compared to other Ewing sarcoma cell lines (Kasan et al. 2023 preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2023.11.20.567802v1). Hence, it may be supportive for the paper to recapitulate/cross-validate a few key results in other Ewing sarcoma cell lines, e.g. by using EWS::ERG-positive cell lines. Perhaps the authors could make use of available published data.

      We thank Reviewer 2 for this helpful comment. We replicated the experiments in TTC-466 cells containing EWSR1::ERG fusion and found that as for A-673 cells the DBD-α4 helix is important for transcriptional, enhancer, and 3D chromatin regulation (Supplementary Figures 9-18).  

      (4) Figure 6 and Supplementary Figure 5 are very interesting but focus on two selected target genes of the fusion (FCGRT and CCND1). It would be interesting to see whether these findings also extend to common EWS::ETS transcriptional signatures that have been reported. The authors could explore their data and map established consensus EWS::ETS signatures to investigate which other hubs might be affected at relevant target genes.

      We expanded our analysis to other genes demonstrated to be regulated by EWSR1::FLI1 nucleated transcriptional hubs (Chong, et. al., 2018) and included NKX2-2 and GSTM4 gene regions in

      Supplementary Figure 7-8 in A-673 cells. We also investigated the same gene regions of FCGRT, CCND1, NKX2-2, GSTM4 in TTC466 cells and report them in Supplementary Figures 14-17. For the purpose brevity, we decided to include the above examples. We may need to develop different tools to conduct further analysis to understand the gene regulatory networks driven by DBD and DBD+ in relation to hub formation. Although it is a great suggestion to map such network, this may be outside the scope of this manuscript. We thank the reviewer for bringing such a good point to our attention.  

      (5) Table 1 is a bit hard to read. In my opinion, it is not necessary to display P-values with up to 8 decimal positions. The gene symbols should be displayed in italic font.

      Suggestions are adapted, thanks!

      Reviewing Editor (Recommendations For The Authors):

      We would draw the authors' attention to the following issues that would best benefit from additional revision.

      As indicated by Referee 1, an important issue concerns the apparent poor reproducibility of Micro-C data. In Figure 1B, the clustering of the DBD1, DBD2, DBD+1, and DBD+2 is poor.

      It appears that the distances/clustering observed between replicates are typically similar or even larger than between biological conditions. Lines 80-83: "KD replicates clustered together with DBD replicate 1 on both axes and with DBD replicate 2 on the y-axis. DBD+ replicates, on the other hand, clustered away from both KD and DBD replicates. If one replaced DBD replicate 1 with DBD replicate 2, this statement would no longer be true. The referees believe that it is important to fully account for these potential discrepancies. Most of the study is based on analyses of these data sets, so if there are issues with them it has repercussions on the entire study. We note however that in Figure 4A the clustering of the H3K27ac data is much more convincing. The referees also feel that it is important to show immunoblots of the expression of DBD and DBD+ levels in the experiments performed here. While this was previously shown in the Boone et al publication in 2021, it could be illustrated again here.

      We thank the editors for concisely summarizing the main weaknesses of the paper and underscoring the importance of the Micro-C data in the rest of the paper. While the Editors note tighter clustering of the H3K27ac (Figure 4A), we would like to note that the replicates cluster much closely on PCA plot of RNA-seq data (Supplementary Figure 1C). Notably, the RNA-seq result has now reproduced when performed with different sets of hands across multiple studies (Boone, et. al., 2021 and this report), as well as in a second cell line (as reported in this manuscript revision). Though not as tight, the H3K27ac CUT&Tag also reproduces in TTC466 cells. Thus, we interpret these findings to indicate that our replicates are functionally similar to each other. As discussed above in the response to Reviewer 1 in more detail, there are several factors that could affect how these functional similarities are represented in Micro-C data. Micro-C is ultimately a readout of the chromatin organization in a heterogeneous population of cells (Misteli et al., 2020). Additionally, sequencing depth limitations in conventional Micro-C experiments limit the ability to faithfully assess the enhancer-promoter interactions that may be relevant for our model system (Goel, et. al., 2023). Thus, both the strength of the biologically relevant signal and the heterogeneous chromatin state present in bulk samples could affect the average signal and lead to poorly clustering replicates (Hafner and Boettiger, 2022). 

      To address these important concerns about rigor and reproducibility of the analyses, we repeated our study in an additional cell line, TTC466, and largely reproduced our high-level findings for transcription, enhancer formation, and 3D chromatin. These additional studies were not without their own limitations and these are addressed in the Discussion section (392-420). The reproduction of weak/moderate clustering in the MDS plot in both A673 and TTC466 cell lines suggests the α4 helix of EWSR1::ETS fusions are important for reshaping 3D chromatin. However, additional genomic analyses geared toward higher resolution at specific EWSR1::ETS-bound loci are likely an important area of future study required to fully understand the role of the α4 helix in chromatin regulation in Ewing sarcoma. Live cell imaging, as performed by Chong, et. al., 2018 and additional biochemical techniques may also be informative and are beyond the scope of this report.

      With regards to concerns about construct expression, we have included immunoblots of the rescue constructs in both cell lines (Supplementary Figure 1B and 9A) and discussed Reviewer 1’s specific concerns in detail above.  

      The referees also raise the issue of using an additional cell line to make a more general message. Although it would perhaps be asking too much to repeat the MicroC experiments, consolidation of the observations could be performed by focusing on specific loci such as FCGRT and CCND1 that were analyzed in this study. Could the authors use 4C-type experiments to reproduce the conclusions in an additional cell line? It would also be pertinent to consolidate the findings at these loci by 4C-type approaches even in the cell line used here. For the moment, all conclusions are based on the same set of data and a single technical approach.

      We repeated the experiments in TTC466 cells and analyzed the data using same cut-offs used in A-673 cells. This allows us to compare between the two cell lines. We hope this new set of experiments and analyses address the reviewers’ concerns.  

      Reviewer #1 (Recommendations For The Authors):

      All the data are performed in A673 cells. Knowing the transcriptomic and epigenetic heterogeneity of Ewing sarcoma cells, some of the experiments supporting their findings should be replicated in at least another Ewing sarcoma model.

      Per our discussion above, we have replicated our experiments in an additional cell line model of Ewing sarcoma. Importantly, the TTC466 cell line used expresses the EWSR1::ERG fusion found in 10-15% of Ewing sarcoma cases.  

      Supplementary Figure 2B. Proportion of TAD boundaries bound by FLAG (i.e., EWS-FLI1) and CTCF. The number/proportion of FLAG (i.e., EWS-FLI) peaks observed at CTCF peak/TAD boundaries seems unexpectedly high. How do they explain this result since EWS-FLI peaks are rather intra-TAD to mediate their enhancer function?

      In our previous study, we showed that EWSR1::FLI1 binding can be detected at boundaries of TADs (Showpnil, et. al., 2022). We think therefore it is likely that EWSR1::FLI1 binding is able to mediate enhancer function both inside TADs as well as at the borders of TADs and may, in some cases, function as an insulator between TADs.  

      For the >50kb loop analysis, what was the low-range threshold? Up to 15-20 kp, contact frequency interactions may be caused by PFA crosslink (did they use a 5kb threshold ?). Were those excluded from that analysis?

      We acknowledge that we did not use a lower threshold to exclude those short-range loop interactions. In our previous study, we observed that EWSR1::FLI1 binding reduces long-range interactions in favor of short-range interactions (Showpnil, et. al., 2022) and wanted to be able to capture short-range loops in our analysis.  

      In Figure 2D, they observed that within TADs containing FLAG peaks at GGAA microsatellites, the intensity of the DBD+ FLAG peaks was higher compared to DBD FLAG peaks. How would this analysis look when considering the ETS FLAG peaks (i.e., EWS-FLI rather repressive peaks)? Could they compare TAD with GGAA msat vs TAD with ETS peaks?

      We agree that this is an interesting observation. In our prior analyses we found no discernible relationship between EWSR1::FLI1 binding and changes in 3D chromatin associated with repression (Showpnil, et. al., Nucleic Acids Research, 2022). In contrast, EWSR1::FLI1-bound superenhancers had greater H3K27ac deposition when overlapping both a bound GGAA repeat and a non-microsatellite site. While there have been several additional reports about the relevance of EWSR1::FLI1 binding at nonmicrosatellite peaks, motifs at these loci have not yet been rigorously defined as GGAA repeats were by Johnson, et. al. in PLoS One, 2017. Each ETS factor binds different motifs containing the core 5’-GGAA-3’ with varying affinities depending on the flanking residues. There may be >100-fold difference in sequence-specific binding affinity for “high” vs. “low” affinity motifs. Better defining the types of ETS motifs bound by EWSR1::FLI1 and the functional changes associated with them thus represents an interesting area of future study.

      Figure 1F: What is the biological meaning of these results (29.7, 39.5, and 54Mbp)? These distances are typically the size of a chromosome arm and clearly beyond classical chromatin loop/TAD structures in which EWS-FLI mediates its cis-activity.

      We agree with referee here. This panel is now removed in our revised manuscript.  

      How do DBD, KD, and DBD+ conditions compare with WT parental cells in the omics data? (Figures 1B, 4A). Do DBD+ conditions overlap with WT conditions? It would be nice to have these analyses also for Micro-C and Cut&Tag data. To be acknowledged here, the transcriptome data showing this aspect in Figure S1C are very convincing.

      This is a fair point. We were not able to obtain similar sequencing depth of wtEF Micro-C libraries to that of KD, DBD and DBD+ due to disproportional use of wtEF libraries in troubleshooting. Therefore, we decided to exclude wtEF condition from these analysis. 

      EWS-FLI cis-regulation at CCND1 also occurs through a much closer EWS-FLI peak (~-20kb msat upstream of CCND1 TSS) which was not taken into consideration. EWS-FLI peak intensity in both DBD and DBD+ at this msta seems similar. How would this fit into their model?

      The referee is correct. The closest peak upstream of CCND1 TSS is about ~19kb away. We highlighted this peak with the dashed boxes near the CCND1 TSS (Supplementary Figure 6). Peak intensity of DBD+ FLAG is slightly higher compared to DBD. Nonetheless, we acknowledge that the difference is small. We suspect that the DBD-α4 helix is affecting binding dynamics at GGAA repeats, but these genomics approaches are not well suited to detect small, but significant, changes in binding affinity or dynamics. In this case a more biochemical approach may be needed. Even though, both protein can still bind the same microsatellites, it is possible that they might differ in their stability of binding or in the recruitment of additional proteins. These possibilities are discussed in the Discussion section (444-463).  

      For the Micro-C, they sequenced only 7 to 8 million reads per condition. This coverage seems particularly low, especially for their analyses using 1-5kb bins. How does this compare with other published Micro-C data? Can this explain the variability observed between replicates?

      We apologize for the inconsistent verbiage of sequencing coverage that may have caused confusion. 7 to 8 million reads were used for shallow sequencing and QC analysis. Once a sample passed QC, we then sequenced 300 million reads per sample. 300M is now changed to 300 million to prevent a misunderstanding at line 598.  

      They mention:

      "In our recent studies of EWS::FLI, we found a small alpha helix in the DNA binding domain DBD-𝛼𝛼4, to

      be required for transcription and regulation by the fusion protein (Boone et al., 2021). Interestingly, this study did not find any change in chromatin accessibility (ATAC-Seq) and genome localization of EWS::FLI constructs (CUT&RUN) when DBD-𝛼𝛼4 helix was deleted leaving the mechanistic basis for the requirement of DBD-𝛼𝛼4 in transcription regulation unclear. "

      And

      "To assay the enhancer landscape, we collected H3K27ac CUT&Tag data from KD, DBD, and DBD+ cells. Principal component analysis of H3K27ac localization shows that the DBD replicates were clustered closer to the KD replicates while being in between the KD and the DBD+ replicates (Figure 4A), suggesting that DBD-𝛼𝛼4 helix is required to reshape the enhancer landscape."

      But now H3K27ac CUT&Tag show strong differences which were not observed in ATAC seq. How to explain this discrepancy?

      Though both H3K27ac and ATAC signal are associated with enhancers and promoters in euchromatin, they are not exactly measurements of the same thing. H3K4me2 is a mark more closely associated with ATAC signal than H3K27ac (Henikoff, et. al., 2020). Nonetheless, there are clear differences between the prior publication (Boone, et. al., 2021) and this work with regards to similar ATAC signal for each replicate and differences in H3K27ac. We suspect this may be related to a tighter association between H3K27ac and EWSR1::FLI1-mediated genome regulation and ATAC. Notably, there were very few differentially accessible regions between EWSR1::FLI1-depleted cells and conditions with EWSR1::FLI1 expression (either endogenous or wildtype rescue) using the A673 KD/Rescue system in Boone, et. al., 2021. In contrast, other A673 KD-rescue studies have reported differences in H3K27ac in EWSR1::FLI1 expressing conditions relative to EWSR1::FLI1-depleted conditions (Theisen, et. al., 2021). .  

      The authors mention:

      "Our study thus uncovered a surprising role for FLI DBD in the process of hub formation which is usually attributed to the EWS low complexity domain."

      Not sure this can be claimed, hubs are composed of many other factors that are not investigated here. Furthermore, promoter enhancer hubs/loops often include combined ETS and mSat chains to generate transcriptional hubs which have not been considered here. None of these points were discussed here.

      We replaced “uncovered” with “suggest” in our revised manuscript at line 476.  

      What are the barcode patterns in Supp 5, are those frequently observed in their Micro-C data, likely mapping artifacts, do they have any impact on their analyses?

      The barcode patterns in now Supplementary Figure 6 are blind spots in the hg19 genome assembly. Since they are few in numbers, we don’t expect these blind spots to impact our analysis.

    1. Author response:

      Reviewer #1 (Public Review): 

      Summary: 

      The authors use fluorescence lifetime imaging (FLIM) and tmFRET to resolve resting vs. active conformational heterogeneity and free energy differences driven by cGMP and cAMP in a tetrameric arrangement of CNBDs from a prokaryotic CNG channel. 

      Strengths: 

      The excellent data provide detailed measures of the probability of adopting resting vs. activated conformations with and without bound ligands. 

      Weaknesses: 

      Limitations are that only the cytosolic fragments of the channel were studied, and the current manuscript does not do a good job of placing the results in the context of what is already known about CNBDs from other methods that yield similar information. 

      In the revision, we will put our results into context of the previous work of CNBD channels where possible.

      Reviewer #2 (Public Review): 

      The authors investigated the conformational dynamics and energetics of the SthK Clinker/CNBD fragment using both steady-state and time-resolved transition metal ion Förster resonance energy transfer (tmFRET) experiments. To do so, they engineered donor-acceptor pairs at specific sites of the CNBD (C-helix and β-roll) by incorporating a fluorescent noncanonical amino acid donor and metal ion acceptors. In particular, the authors employed two cysteine-reactive metal chelators (TETAC and phenM). This allowed them to coordinate three transition metals (Cu2+, Fe2+, and Ru2+) to measure both short (10-20 Å, Cu2+) and long distances (25-50 Å, Fe2+, and Ru2+). By measuring tmFRET with fluorescence lifetimes, the authors determined intramolecular distance distributions in the absence and presence of the full agonist cAMP or the partial agonist cGMP. The probability distributions between conformational states without and with ligands were used to calculate the changes in free energy (ΔG) and differences in free energy change (ΔΔG) in the context of a simple four-state model. 

      Overall, the work is conducted in a rigorous manner, and it is well-written. I greatly enjoyed reading it. 

      Nonetheless, I do not see the novelty that the authors claim. 

      We will try to highlight the novelty in the revision. (See below for examples).

      In terms of methodology, this work provides further support to steady-state and time-resolved tmFRET approaches previously developed by the authors of the present work to probe conformational rearrangements by using a fluorescent noncanonical amino acid donor (Anap) and transition metal ion acceptor (Zagotta et al., eLIfe 2021; Gordon et al., Biophysical Journal 2024; Zagotta et al., Biophysical Journal 2024). 

      This work is the first use of the time-resolved tmFRET method to obtain intrinsic DG (of an apo conformation) and DDG values for different ligands, and the first application of this approach to a protein other than MBP.

      Regarding cyclic nucleotide-binding domain (CNBD)-containing ion channels, I disagree with the authors when they state that "the precise allosteric mechanism governing channel activation upon ligand binding, particularly the energetic changes within domains, remains poorly understood". On the contrary, I would say that the literature on this subject is rather vast and based on a significantly large variety of methodologies. This is a not exhaustive list of papers: Zagotta et al., Nature 2003; Craven et al., GJP, 2004; Craven et al., JBC, 2008; Taraska et al., Nature Methods, 2009; Puljung et al., JBC, 2013; Saponaro et al., PNAS 2014; Goldschen-Ohm et al., eLife, 2016; Bankston et al., JBC, 2017; Hummert et al., PLoS Comput Biol., 2018; Porro et al., eLife, 2019; Ng et al., JGP, 2019; Porro et al., JGP, 2020; Evans et al., PNAS, 2020; Pfleger et al., Biophys J. 2021; Saponaro et al., Mol Cell, 2021; Dai et al., Nat Commun. 2021; Kondapuram et al., Commun Biol. 2022. These studies were conducted either on the isolated Clinker/CNBD fragments or on the entire full-length proteins. As is evident from the above list, the authors of the present work have significantly contributed to the understanding of the allosteric mechanism governing the ligand-induced activation of CNBD-containing channels, including a detailed description of the energetic changes induced by ligand binding. Particularly relevant are their works based on DEER spectroscopy. In DeBerg et al., JBC 2016, the authors described, in atomic detail, the conformational changes induced by different cyclic nucleotides on the HCN CNBD fragment and derived energetics associated with ligand binding to the CNBD (ΔΔG). In Collauto et al., Phys Chem Chem Phys. 2017, they further detailed the ligand-CNBD conformational changes by combining DEER spectroscopy with microfluidic rapid freeze quench to resolve these processes and obtain both equilibrium constants and reaction rates, thus demonstrating that DEER can quantitatively resolve both the thermodynamics and the kinetics of ligand binding and the associated conformational changes. 

      Despite this vast literature, some of which is our own work, there is no consensus about the energetics and coupling of domains that underlies the allosteric mechanism in any CNBD channel. Our approach addresses energetics of the CNBD upon ligand binding, which we aim to later expand to a more complete assessment of the allosteric mechanism in the intact channel.

      Suggestions: 

      - In light of the above, I suggest the authors better clarify the contribution/novelty that the present work provides to the state-of-the-art methodology employed (steady-state and time-resolved tmFRET) and of CNBD-containing ion channels. In particular, it would be nice to have a comparison with the conformational dynamics and energetics reported in the previous works of the authors based on DEER spectroscopy (DeBerg et al., JBC 2016, Collauto et al., Phys Chem Chem Phys. 2017 and Evans et al., PNAS, 2020) and with Goldschen-Ohm et al., eLife, 2016, where single-molecule events (FRET-based) of cAMP binding to HCN CNBD were measured and kinetic rate constants were models in the context of a simple four-state model, reminiscent of the model employed in the present work. 

      In the revision, we will put our results into context of the previous work of CNBD channels where possible.

      - Even considering the bacterial SthK channel, cryo-EM has significantly advanced the atomistic understanding of its ligand-dependent regulation (Rheinberger et al., eLife, 2018). More recently, the authors of the present work have elegantly employed DEER on full-length SthK protein to reveal ligand-dependent conformational rearrangements in the Clinker region (Evans et al., PNAS, 2020). In light of the above, what is the contribution/novelty that the present work provides to the SthK biophysics? 

      Neither of the papers mentioned above (structure or DEER) reported energetics for SthK. This work describes an approach that will allow us to get a more complete picture of the energetics of SthK.

      - The authors decided to use the Clinker/CNBD fragment of SthK. On the basis of the above-cited work (Evans et al., PNAS, 2020) the authors should clarify why they have decided to work on the isolated Clinker/CNBD fragment and not on the full-length protein. I assume that the use of the C-licker/CNBD fragment was necessary to isolate tetramers with only one labelled subunit (fSEC and MP were used to confirm this) to avoid inter-subunit crass-talk. However, I am not clear if this is correct. 

      We chose to start on the C-terminal fragment to provide a technically more tractable system for validating our approach using time-resolved tmFRET before moving to the full-length membrane protein.

      - What is the advantage of using the Clinker/CNBD fragment of a bacterial protein and not one of HCN channels, as already successfully employed by the authors (see above citations)? 

      SthK is a useful model system that allows us to later express full-length channels in bacteria.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript aims to provide insights into conformational transitions in the cyclic nucleotide-binding domain of a cyclic nucleotide-gated (CNG) channel. The authors use transition metal FRET (tmFRET) which has been pioneered by this lab and previously led to detailed insights into ion channel conformational changes. Here, the authors not only use steady-state measurements but also time-resolved, fluorescence lifetime measurements to gain detailed insights into conformational transitions within a protein construct that contains the cytosolic C-linker and cyclic nucleotide-binding domain (CNBD) of a bacterial CNG channel. The use of time-resolved tmFRET is a clear advancement of this technique and a strength of this manuscript. 

      In summary, the present work introduced time-resolved tmFRET as a novel tool to study conformational distributions in proteins. This is a clear technological advance. At this stage, conclusions made about energetics in CNG channels are overstated. However, it will be interesting to see in the future how results compare to similar measurements on full-length channels, for example, reconstituted into nanodiscs. 

      Strengths: 

      The results capture known differences in promoting the open state between different ligands (cAMP and cGMP) and are consistent across three donor-acceptor FRET pairs. The calculated distance distributions further are in reasonable agreement with predicted values based on available structures. The finding that the C-helix is conformationally more mobile in the closed state as compared to the open state quantitatively increases our understanding of conformational changes in these channels. 

      Weaknesses: 

      While the use of a truncated construct of SthK is justified, it also comes with certain limitations. The construct is missing the transmembrane part including the pore for ions. However, the pore is the central part of every ion channel and is crucial to describe conformational transitions and energetics that lead to ion channel gating. Two observations in the present study disagree with the results for the full-length channel protein. Here, under apo conditions, the CNBD can adopt an 'open' conformation, and second, cooperativity of channel opening is lost. These differences need to be weighed carefully when judging the impact of the presented results for understanding allostery in CNG channels. Qualitatively, the results can describe movements of the C-helix in CNBDs, but detailed energetics as calculated in this study, need to be limited to the truncated protein construct used. The entire ion channel is an allosteric system and detailed, energetic conclusions cannot be made for the full-length channel when working with only the cytosolic domains. Similarly, the statement "These results demonstrate that time-resolved tmFRET can be utilized to obtain energetic information on the individual domains during the allosteric activation of SthK." is misleading. The data only describe movements of the C-helix. Upon ligand binding, the C-helix moves upwards to coordinate the ligand. Thus, the results are ligand-induced conformational changes (as the title states). Allosteric regulation usually involves remote locations in the protein, which is not the case here. 

      We agree that the full-length channel is more complicated than the C-terminal fragment studied here, but we disagree that there isn’t relevant energetic information from the individual domains. For example, the DDG values measured for the C-helix movement in the isolated fragment should be the same as those of the intact channel. In the future we aim to make direct comparisons of the energetics between the fragment and the intact channel.

    1. Author response:

      We thank the editors and the reviewers for their considered comments and helpful suggestions.

      In our revision, we plan to focus on tightening the relationship between the bias-variance tradeoff theory and the empirical analyses that follow.

      We will also work to better communicate what we argue—and what is beyond our scope—with respect to GxE in complex traits. For example, our language is currently insufficiently clear as it suggested to the editor and reviewers that we are developing a method to characterize polygenic GxE here. Developing a new method that does so (let alone evaluating performance in extensive scenarios) is beyond the scope of this manuscript.

      Similarly, we use amplification only as an example of a mode of GxE that is not adequately characterized by current approaches. We do not wish to argue it is an omnibus explanation for all GxE in complex traits. In many cases, a mixture of polygenic GxE relationships seems most fitting (as observed, for example, in Zhu et al., 2023, for GxSex in human physiology).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines fMRI and electrophysiology in sedated and awake rats to show that LFPs strongly explain spatial correlations in resting-state fMRI but only weakly explain temporal variability. They propose that other, electrophysiology-invisible mechanisms contribute to the fMRI signal. The evidence supporting the separation of spatial and temporal correlations is convincing, however, the support of electrophysiological-invisible mechanisms is incomplete, considering alternative potential factors that could account for the differences in spatial and temporal correlation that were observed. This work will be of interest to researchers who study the fundamental mechanisms behind resting-state fMRI.

      We appreciate the encouraging comments. We added a section in discussion that thoroughly discussed the potential alternative factors that could account for the differences in spatial and temporal correlation that we observed. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Tu et al investigated how LFPs recorded simultaneously with rsfMRI explain the spatiotemporal patterns of functional connectivity in sedated and awake rats. They find that connectivity maps generated from gamma band LFPs (from either area) explain very well the spatial correlations observed in rsfMRI signals, but that the temporal variance in rsfMRI data is more poorly explained by the same LFP signals. The authors excluded the effects of sedation in this effect by investigating rats in the awake state (a remarkable feat in the MRI scanner), where the findings generally replicate. The authors also performed a series of tests to assess multiple factors (including noise, outliers, and nonlinearity of the data) in their analysis.

      This apparent paradox is then explained by a hypothetical model in which LFPs and neurovascular coupling are generated in some sense "in parallel" by different neuron types, some of which drive LFPs and are measured by ePhys, while others (nNOS, etc.) have an important role in neurovascular coupling but are less visible in Ephys data. Hence the discrepancy is explained by the spatial similarity of neural activity but the more "selective" LFPs picked up by Ephys account for the different temporal aspects observed.

      This is a deep, outstanding study that harnesses multidisciplinary approaches (fMRI and ephys) for observing brain activity. The results are strongly supported by the comprehensive analyses done by the authors, which ruled out many potential sources for the observed findings. The study's impact is expected to be very large.

      Comment: There are very few weaknesses in the work, but I'd point out that the 1second temporal resolution may have masked significant temporal correlations between

      LFPs and spontaneous activity, for instance, as shown by Cabral et al Nature Communications 2023, and even in earlier QPP work from the Keilholz Lab. The synchronization of the LFPs may correlate more with one of these modes than the total signal. Perhaps a kind of "dynamic connectivity" analysis on the authors' data could test whether LFPs correlate better with the activity at specific intervals. However, this could purely be discussed and left for future work, in my opinion.

      We appreciate this great point. Indeed, it is likely that LFP and rsfMRI signals are more strongly related during some modes/instances than others, and hence correlation across the entire time series may have masked this effect. In addition, we agree that 1-second temporal resolution may obscure some temporal correlations between LFPs and rsfMRI signal. The choice of 1-second temporal resolution was made to be consistent with the TR in our fMRI experiment, considering the slow hemodynamic response. Ultrafast fMRI imaging combined with dynamic connectivity analysis in a future study might enable more detailed examination of BOLD-LFP temporal correlations at higher temporal resolutions. We have added the following paragraph to the revised manuscript:

      “Our proposed theoretic model represents just one potential explanation for the apparent discrepancy in temporal and spatial relationships between resting-state electrophysiology and BOLD signals. It is important to acknowledge that there may be other scenarios where a stronger temporal relationship between LFP and BOLD signals could manifest. For instance, recent research suggests that the relationship between LFP and rsfMRI signals may vary across different modes or instances (Cabral et al., 2023), which can be masked by correlations across the entire time series. Moreover, the 1-second temporal resolution employed in our study may obscure certain temporal correlations between LFPs and rsfMRI signals. Future investigations employing ultrafast fMRI imaging coupled with dynamic connectivity analysis could offer a more nuanced exploration of BOLD-LFP temporal correlations at higher temporal resolutions (Bolt et al., 2022; Cabral et al., 2023; Ma and Zhang, 2018; Thompson et al., 2014).”

      Reviewer #2 (Public Review):

      The authors address a question that is interesting and important to the sub-field of rsfMRI that examines electrophysiological correlates of rsfMRI. That is, while electrophysiology-produced correlation maps often appear similar to correlation maps produced from BOLD alone (as has been shown in many papers) is this actually coming from the same source of variance, or independent but spatially-correlated sources of variance? To address this, the authors recorded LFP signals in 2 areas (M1 and ACC) and compared the maps produced by correlating BOLD with them to maps produced by BOLD-BOLD correlations. They then attempt to remove various sources of variance and see the results.

      The basic concept of the research is sound, though primarily of interest to the subset of rsfMRI researchers who use simultaneous electrophysiology. However, there are major problems in the writing, and also a major methodological problem.

      Major problems with writing:

      Comment 1: There is substantial literature on rats on site-specific LFP recording compared to rsfMRI, and much of it already examined removing part of the LFP and examining rsfMRI, or vice versa. The authors do not cover it and consider their work on signal removal more novel than it is.

      We have added more literature studies to the revised manuscript. It is important to note that while there exists a substantial body of literature on site-specific LFP recording coupled with rsfMRI, our paper makes a significant contribution by unveiling the disparity in temporal and spatial relationships between resting-state electrophysiological and fMRI signals. This goes beyond mere reporting of spatial/temporal correlations. Furthermore, our exploration of the impact of removing LFP on rsfMRI spatial patterns constitutes one among several analyses employed to demonstrate that the temporal fluctuations of LFP minimally affect BOLD-derived RSN spatial patterns. We wish to clarify that our intention is not to claim this aspect of our work is more novel than similar analyses conducted in previous studies (we apologize if our original manuscript conveyed that impression). Rather, the novelty lies in the objective of this analysis, which is to elucidate the displarity in temporal and spatial relationships between resting-state electrophysiological and fMRI signals—a crucial issue that has not been thoroughly addressed previously. 

      Comment 2: The conclusion of the existence of an "electrophysiology-invisible signal" is far too broad considering the limited scope of this study. There are many factors that can be extracted from LFP that are not used in this study (envelope, phase, infraslow frequencies under 0.1Hz, estimated MUA, etc.) and there are many ways of comparing it to the rsfMRI data that are not done in this study (rank correlation, transformation prior to comparison, clustering prior to comparison, etc.). The one non-linear method used, mutual information, is low sensitivity and does not cover every possible nonlinear interaction. Mutual information is also dependent upon the number of bins selected in the data. Previous studies (see 1) have seen similar results where fMRI and LFP were not fully commensurate but did not need to draw such broad conclusions.

      First we would like to clarify that the existence of "electrophysiologyinvisible signal" is not necessarily a conclusion of the present study, per se, as described by the reviewer. As we stated in our manuscript, it is a proposed theoretical model. We fully acknowledge that this model represents just one potential explanation for the apparent discrepancy in temporal and spatial relationships between resting-state electrophysiology and BOLD signals. It is important to acknowledge that there may be other scenarios where a stronger temporal relationship between LFP and BOLD signals could manifest. This issue has been further clarified in the revised manuscript (see the section of Potential pitfalls). 

      We agree with the reviewer that not all factors that can be extracted from LFP are examined. In our current study we focused solely on band-limited LFP power as the primary feature in our analysis, given its prevalence in prior studies of LFP-rsfMRI correlates. More importantly, we demonstrate that band-specific LFP powers can yield spatial patterns nearly identical to those derived from rsfMRI signals, prompting a closer examination of the temporal relationship between these same features. Furthermore, since correlational analysis was used in studying the LFP-BOLD spatial relationship, we used the same analysis method when comparing their temporal relationship. 

      Extracting all possible features from the electrophysiology signal and examining their relationship with the rsfMRI signal or exploring all other types of ways of comparing LFP and rsfMRI signals goes beyond the scope of the current study. However, to address the reviewer’s concern, we tried a couple of analysis methods suggested by the reviewer, and results remain persistent. Figure S14 shows the results from (A) the rank correlation and (B) z transformation prior to comparison. We added these new results to the revised manuscript.

      Comment 3: The writing refers to the spatial extent of correlation with the LFP signal as "spatial variance." However, LFP was recorded from a very limited point and the variance in the correlation map does not necessarily reflect underlying electrophysiological spatial distributions (e.g. Yu et al. Nat Commun. 2023 Mar 24;14(1):1651.)

      The reviewer accurately pointed out that in our paper, “spatial variance” refers to the spatial variance of BOLD correlates with the LFP signal. Our objective is to assess the extent to which this spatial variance, which is derived from the neural activity captured by LFP in the M1 or ACC, corresponds to the BOLD-derived spatial patterns from the same regions. We acknowledge that this spatial variance may differ from the spatial map obtained by multi-site electrophysiology recordings. Nevertheless, numerous studies have consistently reported a high spatial correspondence between BOLD- and electrophysiology-derived RSNs using various methodologies across different physiological states in both humans and animals. For instance, research employing electroencephalography (EEG) or electrocorticography (ECoG) in humans demonstrates that RSNs derived from the power of multiple-site electrophysiological signals exhibit similar spatial patterns to classic BOLD-derived RSNs such as the default-mode network (Hacker et al., 2017; Kucyi et al., 2018). These studies well agree with our findings. Notably, the reference paper cited by the reviewer studies brain-wide changes during transitions between awake and various sleep stages, which is quite different from the brain states examined in our study.

      Major method problem:

      Comment 4: Correlating LFP to fMRI is correlating two biological signals, with unknown but presumably not uniform distributions. However, correlating CC results from correlation maps is comparing uniform distributions. This is not a fair comparison, especially considering that the noise added is also uniform as it was created with the rand() function in MATLAB.

      This is a good point. We examined the distributions of both LFP powers and fMRI signals. They both seem to follow a normal distribution. Below shows distributions of the two signals from a random scan. In addition, z transformation prior to comparison generated the same results (Fig. S14).

      Author response image 1.

      Exemplar distributions of A) the fMRI signal of M1, and B) HRF-convolved LFP power in M1.

      Reviewer #1 (Recommendations For The Authors):

      Comment 1: In the Discussion, a few more calcium imaging papers could be fruitfully discussed (e.g. Ma et al Resting-state hemodynamics are spatiotemporally coupled to synchronized and symmetric neural activity in excitatory neurons, PNAS 2016, or more recently Vafaii et al, Multimodal measures of spontaneous brain activity reveal both common and divergent patterns of cortical functional organization, Nat Comms 2024).

      We appreciate this suggestion. We have added the following discussions to the revised manuscript: 

      “These findings indicate the temporal information provided by gamma power can only explain a minor portion (approximately 35%) of the temporal variance in the BOLD time series, even after accounting for the noise effect, which is in line with the reported correlation value between the cerebral blood volume and fluctuations in GCaMP signal in head-fixed mice during periods of immobility (R = 0.63) (Ma et al., 2016).” 

      “It is plausible that employing different features or comparison methods could yield a stronger BOLD-electrophysiology temporal relationship (Ma et al., 2016).”

      “Furthermore, in a more recent study by Vafaii and colleagues, overlapping cortical networks were identified using both fMRI and calcium imaging modalities, suggesting that networks observable in fMRI studies exhibit corresponding neural activity spatial patterns (Vafaii et al., 2024).” 

      “Furthermore, Vafaii et. al. revealed notable differences in functional connectivity strength measured by fMRI and calcium imaging, despite an overlapping spatial pattern of cortical networks identified by both modalities (Vafaii et al., 2024).”

      Comment 2: Similarly when discussing the "invisible" populations, perhaps Uhlirova et al eLife 2016 should be mentioned as some types of inhibitory processes may also be less clearly observed in LFPs but rather strongly contribute to NVC.

      We appreciate the suggestion. We added the following sentences to the revised manuscript. 

      “Additionally, Uhlirova et al. conducted a study where they utilized optogenetic stimulation and two-photon imaging to investigate how the activation of different neuron types affects blood vessels in mice. They discovered that only the activation of inhibitory neurons led to vessel constriction, albeit with a negligible impact on LFP (Uhlirova et al., 2016).”

      Reviewer #2 (Recommendations For The Authors):

      Major problems with writing:

      Comment 1: The authors need to review past work to better place their study in the context of the literature (some review articles: Lurie et al. Netw Neurosci. 2020 Feb 1;4(1):30-69. & Thompson et al. Neuroimage. 2018 Oct 15;180(Pt B):448-462.)

      Here are some LFP and BOLD "resting state" papers focused on dynamic changes.

      Many of these papers examine both spatial and temporal extents of correlations. Several of these papers use similar methods to the reviewed paper.

      Also, many of these papers dispute the claim that correlations seen are

      "electrophysiology invisible signal." Note that I am NOT saying that "electrophysiology invisible" correlations do not exist (it seems very likely some DO exist). However, the authors did not show that in the reviewed paper, and some of the correlations which they call an "electrophysiology invisible signal" probably would be visible if analyzed in a different manner.

      Quite a few literature studies that the reviewer suggested were already included in the original manuscript. We have also added more literature studies to the revised manuscript. Again, we would like to emphasize that the novelty of our study centers on the discovery of the disparity in temporal and spatial relationships between resting-state electrophysiological and fMRI signals. See below our responses to individual literature studies listed.

      In humans:

      https://pubmed.ncbi.nlm.nih.gov/38082179/ Predicts by using models the paper under review does not use here.

      The following discussion was added to the revised manuscript: 

      “Some other comparison methods such as rank correlation and transformation prior to comparison were also tested and results remain persistent (Fig. S14). These findings align with the notion that, compared to nonlinear models, linear models offer superior predictive value for the rsfMRI signal using LFP data, as comprehensively illustrated in (Nozari et al., 2024) (also see Fig. S7). Importantly, in this study, the predictive powers (represented by R2) of various comparison methods tested all remain below 0.5 (Nozari et al., 2024), suggesting that while certain models may enhance the temporal relationship between LFP and BOLD signals, the improvement is likely modest.”

      In nonhuman primates: https://pubmed.ncbi.nlm.nih.gov/34923136/ Most of the variance that could be creating resting state networks is in the <1 Hz band which the paper under review did not study

      ]We also examined infraslow LFP activity (< 1Hz) in our data. Consistent with the finding in the reference paper (Li et al., 2022), infraslow LFP power and the BOLD signal can derive consistent RSN spatial patterns (for M1, spatial correlation = 0.70), while the temporal correlation remains very low (temporal correlation = 0.08). These results and the reference paper were added to the revised manuscript.

      https://pubmed.ncbi.nlm.nih.gov/28461461/ Compares actual spread of LFP vs. spread of BOLD instead of just correlation between LFP and BOLD.

      The following sentence has been added to the revised manuscript.

      “This high spatial correspondence between rsfMRI and LFP signals can even be found at the columnar level (Shi et al., 2017).”   

      https://pubmed.ncbi.nlm.nih.gov/24048850/ Comparison of small (from LFP) to large (from BOLD) spatial correlations in the context of temporal correlations.

      In this study, researchers compared neurophysiological maps and fMRI maps of the inferior temporal cortex in macaques in response to visual images. They observed that the spatial correlation increased as the neurophysiological maps got greater levels of spatial smoothing. This suggests that fMRI can capture large-scale spatial information, but it may be limited in capturing fine details. Although interesting, this paper did not study the electrophysiology-fMRI relationship at the resting state and hence is not very relevant to our study.

      https://pubmed.ncbi.nlm.nih.gov/20439733/ Electrophysiology from a single site can correlate across nearly the entire cerebral cortex.

      We have included the discussion of this paper in the original manuscript.

      https://pubmed.ncbi.nlm.nih.gov/18465799/ The original dynamic BOLD and LFP work from 2008 by Shmuel and Leopold included spatiotemporal dynamics.

      We have included the discussion of this paper in the original manuscript.

      In rodents:

      https://pubmed.ncbi.nlm.nih.gov/34296178/ Better electrophysiological correspondence was found using alternate methods the paper under review does not use.

      This study investigates the electrophysiological correspondence in taskbased fMRI, while our study focused on resting state signals.

      https://pubmed.ncbi.nlm.nih.gov/31785420/ Electrophysiological basis of co-activation patterns, similar comparisons to the paper under review.

      We have included the discussion of this paper in the original manuscript.

      https://pubmed.ncbi.nlm.nih.gov/29161352/ Cross-frequency coupling of LFP modulating the BOLD, perhaps more so than raw amplitudes.

      This paper investigated the impact of AMPA microinjections in the VTA and found reduced ventral striatal functional connectivity, correlation between the delta band and BOLD signal, and phase–amplitude coupling of low-frequency LFP and highfrequency LFP, suggesting changes in low-frequency LFP might modulate the BOLD signal.

      Consistent with our study, we also found that low-frequency LFP is negatively coupled with the BOLD signal, but we did not investigate changes in neurovascular coupling with disturbed neural activity using pharmacological methods, and hence, we did not discuss this paper in our study.

      https://pubmed.ncbi.nlm.nih.gov/24071524/ This paper did the same kind of tests comparing LFP-BOLD correlations to BOLD-BOLD correlations as the paper under review.

      This study examined the neural mechanism underpinning dynamic restingstate fMRI, revealing a spatiotemporal coupling of infra-slow neural activity with a quasiperiodic pattern (QPP). While our current investigation centered on stationary restingstate functional connectivity, we acknowledge that dynamic analysis will provide additional value for investigating the relationship between LFP and rsfMRI signals. This warrants more investigation in a future study. This point has been added to the revised manuscript.

      https://pubmed.ncbi.nlm.nih.gov/24904325/ This paper found that different frequencies of electrophysiology (including ones not studied in the reviewed paper) contribute independently to the BOLD signal

      This paper identified phase-amplitude coupling in rats anesthetized with isoflurane but not with dexmedetomidine, indicating that this coupling arises from a special type of neural activity pattern, burst-suppression, which was probably induced by high-dose isoflurane. They conjectured that high and low-frequency neural activities may independently or differentially influence the BOLD signal. Our study also examined the influence of various LFP frequency bands on the BOLD signal and found inversed LFP-BOLD relationship between low- and high-frequency LFP powers. We also added more results on the analysis of infraslow LFP signals. Regardless, since the reference study did not examine the spatial relationship of LFP and BOLD activities, we cannot comment on how it may provide insight into our results. 

      https://pubmed.ncbi.nlm.nih.gov/26041826/ This paper found electrophysiological correlates within the BOLD signal when using BOLD analysis methods not used in the reviewed paper, and furthermore that some of these correlate with electrophysiological frequencies not studied in the reviewed paper (< 1 Hz).

      We have added more results on the analysis of infraslow LFP signals and acknowledged the value of dynamic rsfMRI analysis in studies of BOLDelectrophysiology relationship.

      I am not saying the authors need to use all these methods or even cite these papers. As I stated in their review, they merely need to (1) cite some of the most relevant for the proper context, the above list can maybe help (2) remove the claim of an "electrophysiology invisible signal" (3) use terms more commonly used in these papers for the extent of correlation with the electrode, other than "spatial variance."

      We thank the reviewer again for providing a detailed list of reference studies. We have added the related discussion to the revised manuscript as described above.

      Comment 2: The abstract entirely and much of the rest of the paper should be rewritten to be more reasonable. The authors would do well to review some of the past controversies in this area, e.g. Magri et al. J Neurosci. 2012 Jan 25;32(4):1395-407.

      We have made significant revision to improve the writing of the paper. The reference paper has been added to the revised manuscript.

      Comment 3: This should be re-written and the terminology used here should be chosen more carefully.

      The writing of the manuscript has been improved with more careful choice of terminology.    

      Major method problem:

      Comment 4: At a minimum, the authors should be transforming the uniform distribution of CC results to Z or T values and using randn() instead of rand() in MATLAB.

      Below is the figure illustrating the simulation results by transforming CC values to Z score. Results obtained remain consistent.

      Author response image 2.

      Minor problems:

      Comment 5: "MR-510 compatible electrodes (MRCM16LP, NeuroNexus Inc)"

      Details of this type of electrode are not readily available. But for studies like this one, further information on materials is critical as this determines the frequency coverage, which is not even across all LFP frequencies for all materials. Most commercially prepared electrodes cannot record <1Hz accurately, and this study includes at least 0.11Hz in some of its analysis.

      The type of electrode used in our current study is a silicon-based micromachined probe. These probes are fabricated using photolithographic techniques to pattern thin layers of conductive materials onto a silicon substrate. This probe is capable of recording the LFP activity within a broad frequency range, starting from 0.1Hz . We added this information to the revised manuscript. 

      Comment 6: Grounding to the cerebellum in theory would remove global conduction from the LFP but also global signal regression is done to the fMRI. Does the LFP-rsfMRI correlation change due to the regression or does only the rsfMRI-rsfMRI correlation change?

      The results obtained with global signal regression were consistent with those obtained without it (see Figs. S4-S5), and therefore, we do not believe our results are affected by this preprocessing step. 

      Comment 7. Avoid colloquial language like "on the other hand" etc.

      We used more appropriate language in the revised manuscript.

      References:

      Bolt, T., Nomi, J.S., Bzdok, D., Salas, J.A., Chang, C., Thomas Yeo, B.T., Uddin, L.Q., Keilholz, S.D., 2022. A parsimonious description of global functional brain organization in three spatiotemporal patterns. Nat Neurosci 25, 1093-1103.

      Cabral, J., Fernandes, F.F., Shemesh, N., 2023. Intrinsic macroscale oscillatory modes driving long range functional connectivity in female rat brains detected by ultrafast fMRI. Nat Commun 14, 375.

      Hacker, C.D., Snyder, A.Z., Pahwa, M., Corbetta, M., Leuthardt, E.C., 2017. Frequencyspecific electrophysiologic correlates of resting state fMRI networks. Neuroimage 149, 446-457.

      Kucyi, A., Schrouff, J., Bickel, S., Foster, B.L., Shine, J.M., Parvizi, J., 2018. Intracranial Electrophysiology Reveals Reproducible Intrinsic Functional Connectivity within Human Brain Networks. J Neurosci 38, 4230-4242.

      Li, J.M., Acland, B.T., Brenner, A.S., Bentley, W.J., Snyder, L.H., 2022. Relationships between correlated spikes, oxygen and LFP in the resting-state primate. Neuroimage 247, 118728.

      Ma, Y., Shaik, M.A., Kozberg, M.G., Kim, S.H., Portes, J.P., Timerman, D., Hillman, E.M., 2016. Resting-state hemodynamics are spatiotemporally coupled to synchronized and symmetric neural activity in excitatory neurons. Proc Natl Acad Sci U S A 113, E8463-E8471.

      Ma, Z., Zhang, N., 2018. Temporal transitions of spontaneous brain activity. Elife 7.

      Shi, Z., Wu, R., Yang, P.F., Wang, F., Wu, T.L., Mishra, A., Chen, L.M., Gore, J.C., 2017. High spatial correspondence at a columnar level between activation and resting state fMRI signals and local field potentials. Proc Natl Acad Sci U S A 114, 52535258.

      Thompson, G.J., Pan, W.J., Magnuson, M.E., Jaeger, D., Keilholz, S.D., 2014. Quasiperiodic patterns (QPP): large-scale dynamics in resting state fMRI that correlate with local infraslow electrical activity. Neuroimage 84, 1018-1031.

      Uhlirova, H., Kilic, K., Tian, P., Thunemann, M., Desjardins, M., Saisan, P.A., Sakadzic, S., Ness, T.V., Mateo, C., Cheng, Q., Weldy, K.L., Razoux, F., Vandenberghe, M.,

      Cremonesi, J.A., Ferri, C.G., Nizar, K., Sridhar, V.B., Steed, T.C., Abashin, M.,

      Fainman, Y., Masliah, E., Djurovic, S., Andreassen, O.A., Silva, G.A., Boas, D.A., Kleinfeld, D., Buxton, R.B., Einevoll, G.T., Dale, A.M., Devor, A., 2016. Cell type specificity of neurovascular coupling in cerebral cortex. Elife 5.

      Vafaii, H., Mandino, F., Desrosiers-Gregoire, G., O'Connor, D., Markicevic, M., Shen, X.,

      Ge, X., Herman, P., Hyder, F., Papademetris, X., Chakravarty, M., Crair, M.C., Constable, R.T., Lake, E.M.R., Pessoa, L., 2024. Multimodal measures of spontaneous brain activity reveal both common and divergent patterns of cortical functional organization. Nat Commun 15, 229.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary: The global decline of amphibians is primarily attributed to deadly disease outbreaks caused by the chytrid fungus, Batrachochytrium dendrobatidis (Bd). It is unclear whether and how skin-resident immune cells defend against Bd. Although it is well known that mammalian mast cells are crucial immune sentinels in the skin and play a pivotal role in immune recognition of pathogens and orchestrating subsequent immune responses, the roles of amphibian mast cells during Bd infections is largely unknown. The current study developed a novel way to enrich X. laevis skin mast cells by injecting the skin with recombinant stem cell factor (SCF), a KIT ligand required for mast cell differentiation and survival. The investigators found an enrichment of skin mast cells provides X. laevis substantial protection against Bd and mitigates the inflammation-related skin damage resulting from Bd infection. Additionally, the augmentation of mast cells leads to increased mucin content within cutaneous mucus glands and shields frogs from the alterations to their skin microbiomes caused by Bd. 

      Strengths: This study underscores the significance of amphibian skin-resident immune cells in defenses against Bd and introduces a novel approach to examining interactions between amphibian hosts and fungal pathogens. 

      We thank the reviewer for recognizing the significance and the novelty of our work.

      Weaknesses: The main weakness of the study is lack of functional analysis of X. laevis mast cells. Upon activation, mast cells have the characteristic feature of degranulation to release histamine, serotonin, proteases, cytokines, and chemokines, etc. The study should determine whether X. laevis mast cells can be degranulated by two commonly used mast cell activators IgE and compound 48/80 for IgE-dependent and independent pathway. This can be easily done in vitro. It is also important to assess whether in vivo these mast cells are degranulated upon Bd infection using avidin staining to visualize vesicle releases from mast cells. Figure 3 only showed rSCF injection caused an increase in mast cells in naïve skin. They need to present whether Bd infection can induce mast cell increase and rSCF injection under Bd infection causes a mast cell increase in the skin. In addition, it is unclear how the enrichment of mast cells provides the protection against Bd infection and alternations to skin microbiomes after infection. It is important to determine whether skin mast cell release any contents mentioned above. 

      We would like to thank the reviewer for taking the time to review our work and providing us with valuable feedback.

      Please note, that as indicated in our previous rebuttal to reviewers, amphibians do not possess the IgE antibody isotype1.

      To our knowledge, there are no published works describing the approaches used in studying mammalian mast cell degranulation towards examining amphibian mast cells. While there are commercially available kits and reagents for examining mammalian mast cell granule content, most of these do not cross-react with amphibian counterparts. This is especially true of cytokines and chemokines, which diverged quickly with evolution and thus do not share substantial protein sequence identity across species as diverged as frogs and mammals. We would also like to highlight the fact that several studies suggest that amphibian mast cells lack histamine2, 3, 4, 5 and serotonin2, 6. While following up on these findings would be possible, we would like to respectfully emphasize that adopting approaches used in mammalian research to comparative immunology work is not always straightforward.

      As we highlight in our manuscript, frog mast cells upregulate their expression of interleukin-4 (IL4), a hallmark cytokine associated with mammalian mast cells7. The additional findings presented in our revised manuscript indicate that mast cells respond to Bd by upregulating IL4 expression in vitro and in vivo. Together, this suggests that IL4 may be a central means by which frog mast cells confer protection against Bd, by counteracting Bd-elicited inflammation, including minimizing neutrophil infiltration, maintaining skin integrity, and promoting cutaneous mucus production. Please find that these additional results are presented in Figure 8 and are described in the results and discussion sections of our revised manuscript.

      Our attempts to elicit degranulation of frog mast cells using compound 48/80 have so far not been successful. This may reflect technical issues with assays optimized for mammalian mast cells or biological difference between frog and mammalian mast cells, such as species differences in mas-related G-protein coupled receptors, through which compound 48/80 acts8. We will continue to explore means to study frog mast cell degranulation both in vitro and in vivo but also respectfully point out that while degranulation is a feature commonly associated with mammalian mast cells, this is not the only means by which the mammalian mast cells confer their immunological effects. Indeed, our studies suggest that frog mast cell IL4 production may be a key means by which these cells offer anti-Bd protection.

      Please note that we successfully adopted an avidin staining approach to visualize mast cell heparin content in vitro and to evaluate cutaneous mast cell numbers in vivo in control and mast cell-enriched, mock- and Bd-infected animals. This additional work is depicted in Figure 4 and addressed in the results and discussion sections of our revised manuscript.

      Reviewer #2 (Public Review):

      Summary: In this study, Hauser et al investigate the role of amphibian (Xenopus laevis) mast cells in cutaneous immune responses to the ecologically important pathogen Batrachochytrium dendrobatidis (Bd) using novel methods of in vitro differentiation of bone marrow-derived mast cells and in vivo expansion of skin mast cell populations. They find that bone marrow-derived myeloid precursors cultured in the presence of recombinant X. laevis Stem Cell Factor (rSCF) differentiate into cells that display hallmark characteristics of mast cells. They inject their novel (r)SCF reagent in the skin of X. laevis and find that this stimulates expansion of cutaneous mast cell populations in vivo. They then apply this model of cutaneous mast cell expansion in the setting of Bd infection and find that mast cell expansion attenuates skin burden of Bd zoospores and pathologic features including epithelial thickness and improves protective mucus production and transcriptional markers of barrier function. Utilizing their prior expertise with expanding neutrophil populations in X. laevis, the authors compare mast cell expansion using (r)SCF to neutrophil expansion using recombinant colony stimulating factor 3 (rCSF3) and find that neutrophil expansion in Bd infection leads to greater burden of zoospores and worse skin pathology. Combining these two observations, they demonstrate that mast cell expansion using rSCF attenuates cutaneous neutrophilic infiltration. They further show that mast cell expansion correlates to cutaneous IL-4 expression, and that treatment with exogenous rIL-4 reduces neutrophilic infiltration and restores markers of epithelial health, offering a mechanism by which mast cell expansion protects from Bd infection. 

      Strengths: The authors report a novel method of expanding amphibian mast cells utilizing their custom-made rSCF reagent. They rigorously characterize expanded mast cells in vitro and in vivo using histologic, morphologic, transcriptional, and functional assays. This establishes solid footing with which to then study the role of rSCF-stimulated mast cell expansion in the Bd infection model. This appears to be the first demonstration of exogenous use of rSCF in amphibians to expand mast cell populations and may set a foundation for future mechanistic studies of mast cells in the X. laevis model organism. Building on prior work, they are able to contrast mast cell expansion with their neutrophil expansion model, allowing them to infer a mechanistic link between mast cell expansion and IL-4 production and subsequent suppression of neutrophil infiltration and cutaneous dysbiosis. 

      We thank the reviewer for recognizing the rigorousness and utility of the studies presented in our manuscript.

      Weaknesses: The main weaknesses derive from technical limitations inherent to the Xenopus model at this time. For example, in mice a mechanistic study would be expected to use IL-4 knockouts, preferably mast cell-specific, to prove the link between mast cell expansion and IL-4 production being necessary and sufficient to suppress neutrophils. However, the novel reagents in this manuscript present a compelling technical advance and a step forward in the tools available to study amphibian biology. 

      We agree with the reviewer that an IL4 knock-out animal model would be a great way to support our findings. Unfortunately, working with a non-mammalian model such as X. laevis poses limitations that include lack of knock-out lines for immunology research. Moreover, as mentioned in our manuscript, we do not believe that IL4 is the sole mast cell-produced component responsible for the conferred antifungal protection. We thank the reviewer for acknowledging the limitations of our model system and recognizing the novelty, technical advances, and merits of the work presented in our manuscript.

      In addition to their discussion, one open question from the revised manuscript is how a single treatment with rSCF leads to a peak in mast cell numbers and then decline to baseline in mock-infected frogs, while Bd infection either sustains rSCF-boosted mast cells or leads to steady mast cell increase over time in control-treated frogs. Whether this is mediated by endogenous SCF or some other factor remains unexplored.

      This is an interesting question that we hope to explore in future studies. We did not see significant differences in skin SCF gene expression at 21 days post Bd infection. This does not rule out the possibility that the observed Bd-mediated effects to frog skin mast cell composition are not due to changes in skin SCF gene expression at earlier infection times, alone or in combination with other host or pathogen derived factors. We know that other factors are responsible for homing/retention of antimicrobial and immunosuppressive granulocyte subsets within frog skin9 and we postulate that some of these may be distinct mast cell types. Additionally, Bd is known to produce a myriad of immunomodulatory factors10, which may well also directly affect frog skin mast cell composition. Mammalian mast cells are heterogenous and are homed or recruited into tissues by an extensive array of host as well as microbiome-derived components11, 12. Undoubtedly, the frog skin mast cell composition is likewise complex, dynamic, and contingent on a plethora of host, cutaneous microbial flora- and in this case also Bd-produced factors.

      References

      (1) Flajnik, M.F. A cold-blooded view of adaptive immunity. Nat Rev Immunol 18, 438-453 (2018).

      (2) Mulero, I., Sepulcre, M.P., Meseguer, J., Garcia-Ayala, A. & Mulero, V. Histamine is stored in mast cells of most evolutionarily advanced fish and regulates the fish inflammatory response. Proc Natl Acad Sci U S A 104, 19434-19439 (2007).

      (3) Reite, O.B. A phylogenetical approach to the functional significance of tissue mast cell histamine. Nature 206, 1334-1336 (1965).

      (4) Reite, O.B. Comparative physiology of histamine. Physiol Rev 52, 778-819 (1972).

      (5) Takaya, K., Fujita, T. & Endo, K. Mast cells free of histamine in Rana catasbiana. Nature 215, 776-777 (1967).

      (6) Galli, S.J. New insights into "the riddle of the mast cells": microenvironmental regulation of mast cell development and phenotypic heterogeneity. Lab Invest 62, 5-33 (1990).

      (7) Babina, M., Guhl, S., Artuc, M. & Zuberbier, T. IL-4 and human skin mast cells revisited: reinforcement of a pro-allergic phenotype upon prolonged exposure. Archives of dermatological research 308, 665-670 (2016).

      (8) Hermans, M.A.W. et al. Human Mast Cell Line HMC1 Expresses Functional Mas-Related G-Protein Coupled Receptor 2. Front Immunol 12, 625284 (2021).

      (9) Hauser, K. et al. Discovery of granulocyte-lineage cells in the skin of the amphibian Xenopus laevis. FACETS 5, 571 (2020).

      (10) Rollins-Smith, L.A. & Le Sage, E.H. Batrachochytrium fungi: stealth invaders in amphibian skin. Curr Opin Microbiol 61, 124-132 (2021).

      (11) Halova, I., Draberova, L. & Draber, P. Mast cell chemotaxis - chemoattractants and signaling pathways. Front Immunol 3, 119 (2012).

      (12) West, P.W. & Bulfone-Paus, S. Mast cell tissue heterogeneity and specificity of immune cell recruitment. Front Immunol 13, 932090 (2022).


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The global decline of amphibians is primarily attributed to deadly disease outbreaks caused by the chytrid fungus, Batrachochytrium dendrobatidis (Bd). It is unclear whether and how skin-resident immune cells defend against Bd. Although it is well known that mammalian mast cells are crucial immune sentinels in the skin and play a pivotal role in the immune recognition of pathogens and orchestrating subsequent immune responses, the roles of amphibian mast cells during Bd infections are largely unknown. The current study developed a novel way to enrich X. laevis skin mast cells by injecting the skin with recombinant stem cell factor (SCF), a KIT ligand required for mast cell differentiation and survival. The investigators found an enrichment of skin mast cells provides X. laevis substantial protection against Bd and mitigates the inflammation-related skin damage resulting from Bd infection. Additionally, the augmentation of mast cells leads to increased mucin content within cutaneous mucus glands and shields frogs from the alterations to their skin microbiomes caused by Bd.

      Strengths:

      This study underscores the significance of amphibian skin-resident immune cells in defenses against Bd and introduces a novel approach to examining interactions between amphibian hosts and fungal pathogens. 

      We thank the reviewer for acknowledging the novelty and importance of the work presented in our manuscript.

      Weaknesses:

      The main weakness of the study is the lack of functional analysis of X. laevis mast cells. Upon activation, mast cells have the characteristic feature of degranulation to release histamine, serotonin, proteases, cytokines, and chemokines, etc. The study should determine whether X. laevis mast cells can be degranulated by two commonly used mast cell activators IgE and compound 48/80 for IgE-dependent and independent pathways. This can be easily done in vitro. It is also important to assess whether in vivo these mast cells are degranulated upon Bd infection using avidin staining to visualize vesicle releases from mast cells. Figure 3 only showed rSCF injection caused an increase in mast cells in naïve skin. They need to present whether Bd infection can induce mast cell increase and rSCF injection under Bd infection causes a mast cell increase in the skin. In addition, it is unclear how the enrichment of mast cells provides protection against Bd infection and alternations to skin microbiomes after infection. It is important to determine whether skin mast cells release any contents mentioned above. 

      We would like to thank the reviewer for taking the time to review our work and providing us with valuable feedback. We feel that we have successfully incorporated the reviewer’s suggestions into our revised manuscript, thereby improving this work.

      Please note that amphibians do not possess the IgE antibody isotype1.

      To our knowledge there have been no published work assimilating approaches used when studying mammalian mast cell degranulation towards examining amphibian mast cells. While there are commercially available kits and reagents for examining mammalian mast cell granule content, most of these reagents do not cross-react with amphibian counterparts. This is especially true of cytokines and chemokines, which diverged quickly with evolution and thus do not share substantial protein sequence identity across species as diverged as frogs and mammals. Additionally, several studies suggest that amphibian mast cells lack histamine2, 3, 4, 5 and serotonin2, 6. Respectfully, while following up on these findings is possible, we would not consider adopting approaches used in mammalian research to comparative immunology work as easy.

      As noted in our manuscript, frog mast cells upregulate their expression of interleukin-4 (IL4), which is a hallmark cytokine associated with mammalian mast cells7. The additional findings, presented in our revised manuscript indicate that mast cells respond to Bd by upregulating IL4 expression in vitro and in vivo. In turn, our work indicates that IL4 may be a central means by which frog mast cells confer protection against Bd, by counteracting Bd-elicited inflammation, including minimizing neutrophil infiltration, maintaining skin integrity, and promoting mucus production by skin mucus glands. Please find that these additional findings are presented in Figure 8 of our revised manuscript and are described in the results and discussion sections of the paper.

      Our attempts to elicit degranulation of frog mast cells using compound 48/80 have so far not been successful. This may reflect technical issues with assays optimized for mammalian mast cells or biological difference between frog and mammalian mast cells, such as species differences in mas-related G-protein coupled receptors, through which compound 48/80 acts8. We will continue explore means to study frog mast cell degranulation both in vitro and in vivo but would also like to respectfully point out that while mast cell degranulation is a feature most associated with mammalian mast cells, this is not the only means by which the mammalian mast cells confer their immunological effects. Indeed, our additional studies suggest that mast cell IL4 production may be a key means by which these cells offer anti-Bd protection.

      Please find that we have adopted an avidin-staining approach to visualize mast cell heparin content in vitro and to evaluate mast cell numbers in vivo in the skins of control and mast cell-enriched, mock- and Bd-infected animals. This additional work is depicted in Figure 4 of our revised manuscript and addressed in the results and discussion sections of our revised paper.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Hauser et al investigate the role of amphibian (Xenopus laevis) mast cells in cutaneous immune responses to the ecologically important pathogen Batrachochytrium dendrobatidis (Bd) using novel methods of in vitro differentiation of bone marrow-derived mast cells and in vivo expansion of skin mast cell populations. They find that bone marrow-derived myeloid precursors cultured in the presence of recombinant X. laevis Stem Cell Factor (rSCF) differentiate into cells that display hallmark characteristics of mast cells. They inject their novel (r)SCF reagent into the skin of X. laevis and find that this stimulates the expansion of cutaneous mast cell populations in vivo. They then apply this model of cutaneous mast cell expansion in the setting of Bd infection and find that mast cell expansion attenuates the skin burden of Bd zoospores and pathologic features including epithelial thickness and improves protective mucus production and transcriptional markers of barrier function. Utilizing their prior expertise with expanding neutrophil populations in X. laevis, the authors compare mast cell expansion using (r)SCF to neutrophil expansion using recombinant colony-stimulating factor 3 (rCSF3) and find that neutrophil expansion in Bd infection leads to greater burden of zoospores and worse skin pathology. 

      Strengths:

      The authors report a novel method of expanding amphibian mast cells utilizing their custom-made rSCF reagent. They rigorously characterize expanded mast cells in vitro and in vivo using histologic, morphologic, transcriptional, and functional assays. This establishes solid footing with which to then study the role of rSCF-stimulated mast cell expansion in the Bd infection model. This appears to be the first demonstration of the exogenous use of rSCF in amphibians to expand mast cell populations and may set a foundation for future mechanistic studies of mast cells in the X. laevis model organism. 

      We thank the reviewer for recognizing the breadth and extent of the undertaking that culminated in this manuscript. Indeed, this manuscript would not have been possible without considerable reagent development and adaptation of techniques that had previously not been used for amphibian immunity research. In line with the reviewer’s sentiment, to our knowledge this is the first report of using molecular approaches to augment amphibian mast cells, which we hope will pave the way for new areas of research within the fields of comparative immunology and amphibian disease biology.

      Weaknesses:

      The conclusions regarding the role of mast cell expansion in controlling Bd infection would be stronger with a more rigorous evaluation of the model, as there are some key gaps and remaining questions regarding the data. For example: 

      (1) Granulocyte expansion is carefully quantified in the initial time courses of rSCF and rCSF3 injections, but similar quantification is not provided in the disease models (Figures 3E, 4G, 5D-G). A key implication of the opposing effects of mast cell vs neutrophil expansion is that mast cells may suppress neutrophil recruitment or function. Alternatively, mast cells also express notable levels of csfr3 (Figure 2) and previous work from this group (Hauser et al, Facets 2020) showed rG-CSF-stimulated peritoneal granulocytes express mast cell markers including kit and tpsab1, raising the question of what effect rCSF3 might have on mast cell populations in the skin. Considering these points, it would be helpful if both mast cells and neutrophils were quantified histologically (based on Figure 1, they can be readily distinguished by SE or Giemsa stain) in the Bd infection models. 

      We thank the reviewer for this insightful suggestion. Please find that we successfully adopted an in situ hybridization approach to evaluate neutrophil numbers in the skins of control and mast cell-enriched, mock- and Bd-infected animals based on expression of the neutrophil marker, myeloperoxidase (MPO9).  Please find these results are presented in Figures 6 and 8 of our revised manuscript and addressed in the appropriate sections of our revised paper.

      Our findings suggest that rSCF administration results in the accumulation of mast cells that are polarized such, that they ablate the inflammatory response elicited by Bd infection, such as through mechanisms like IL4 production. Mammalian mast cells, including peritonea-resident mast cells, express csf3r10, 11. For this reason, we used MPO expression to visualize neutrophil skin infiltration in Figures 6 and 8 of our revised work. While the X. laevis animal model does not permit nearly the degree of immune cell resolution afforded by mammalian animal models, we do know that the adult X. laevis peritonea contain a myriad of immune cell subsets. We anticipate that the high kit expression reported by Hauser et al., 2020 in the rCSF3-recruited peritoneal leukocytes reflects the presence of mast cells therein.

      Please find that we have used avidin-staining and MPO in situ hybridization to respectively visualize and enumerate mast cells and neutrophils in the skin of control and mast cell-enriched, mock- and Bd-infected animals. Indeed, our results show interesting, experimental condition-dependent changes in both the skin neutrophil and mast cell numbers. The results of these additional studies are presented in Figures 4, 6 and 8 of the revised manuscript and addressed in the results and discussions sections of our revised paper.

      (2) Epithelial thickness and inflammation in Bd infection are reported to be reduced by rSCF treatment (Figure 3E, 5A-B) or increased by rCSF3 treatment (Figure 4G) but quantification of these critical readouts is not shown.

      We thank the reviewer for this suggestion. We scored epithelial thickness under the distinct conditions described in our manuscript and presented the quantified data in Figures 5 and 8 of the revised paper.

      (3) Critical time points in the Bd model are incompletely characterized. Mast cell expansion decreases zoospore burden at 21 dpi, while there is no difference at 7 dpi (Figure 3E). Conversely, neutrophil expansion increases zoospore burden at 7 dpi, but no corresponding 21 dpi data is shown for comparison (Figure 4G). Microbiota analysis is performed at a third time point,10 dpi (Figure 5D-G), making it difficult to compare with the data from the 7 dpi and 21 dpi time points. Reporting consistent readouts at these three time points is important to draw solid conclusions about the relationship of mast cell expansion to Bd infection and shifts in microbiota.

      We thank reviewer for noting this discrepancy. Please find that we have repeated our mast cell-enrichment, Bd-challenge studies, examining days 10 and 21 post infection. Our new findings indicate that compared to control animals, mast cell-enrichment does result in significant reduction in Bd loads at both 10 and 21 dpi. The difference in Bd loads between r-ctrl and rSCF-treated animals at 10 dpi corroborates the other parameters that are altered between the two treatment groups at this experimental time point.

      Our question regarding the roles of inflammatory granulocytes/neutrophils during Bd infections was that of ‘how’ rather ‘when’ these cells affect Bd infections.  Thus, and because the central focus of this work was mast cells and not other granulocyte subsets; when we saw that rCSF3-recruited granulocytes adversely affect Bd infections at 7 days, we did not pursue the kinetics of these observations further. We plan to explore the roles of inflammatory mediators and immune cell subsets during the course of Bd infections but feel that these future studies are more peripheral to the central thesis of the present manuscript regarding the roles of frog mast cells during Bd infections.

      (4) Although the effect of rSCF treatment on Bd zoospores is significant at 21 dpi (Figure 3E), bacterial microbiota changes at 21 dpi are not (Figure S3B-C). This discrepancy, how it relates to the bacterial microbiota changes at 10 dpi, and why 7, 10, and 21 dpi time points were chosen for these different readouts (Figure 5F-G), is not discussed.

      Please find that our additional studies indicate that compared to control animals, frog skin mast cell-enrichment results in significant reduction in Bd loads at 10 dpi. This corroborate our other findings including the observation that at 10 dpi, control animals exhibit reduced microbial richness whereas mast cell-enriched frogs were protected from this disruption of their microbiome. The amphibian microbiome serves as a major barrier to these fungal infections12 and we anticipate that Bd-mediated disruption of microbial richness facilitates host skin colonization by this pathogen. In turn, we anticipate that frog mast cells are conferring the observed anti-Bd protection in part by preventing microbial disassembly and thus interfering with optimal Bd colonization and growth on frog skins. Please find that we acknowledge and discuss these notions in our revised manuscript.

      (5) The time course of rSCF or rCSF3 treatments relative to Bd infection in the experiments is not clear. Were the treatments given 12 hours prior to the final analysis point to maximize the effect? For example, in Figure 3E, were rSCF injections given at 6.5 dpi and 20.5 dpi? Or were treatments administered on day 0 of the infection model? If the latter, how do the authors explain the effects at 7 dpi or 21 dpi given mast cell and neutrophil numbers return to baseline within 24 hours after rSCF or rCSF3 treatment, respectively?

      Please find that in our revised manuscript, we underlined the kinetics of our animal treatments and Bd-infections. In brief, for mast cell-enrichment, animals were injected with r-ctrl or rSCF, challenged 12 hours later with Bd and examined after 10 (per reviewers’ suggestions) and 21 days of infection. For neutrophil enrichment, animals were injected with r-ctrl or rCSF3, challenged 12 hours later with Bd and examined after 7 days of infection.

      The title of the manuscript may be mildly overstated. Although Bd infection can indeed be deadly, mortality was not a readout in this study, and it is not clear from the data reported that expanding skin mast cells would ultimately prevent progression to death in Bd infections.

      We acknowledge this point. The revised manuscript will be titled: “Amphibian mast cells: barriers to chytrid fungus infections”.

      Reviewer #3 (Public Review):

      Summary:

      Hauser et al. provide an exceptional study describing the role of resident mast cells in amphibian epidermis that produce anti-inflammatory cytokines that prevent Batrachochytrium dendrobatidis (Bd) infection from causing harmful inflammation, and also protect frogs from changes in skin microbiomes and loss of mucin in glands and loss of mucus integrity that otherwise cause changes to their skin microbiomes. Neutrophils, in contrast, were not protective against Bd infection. Beyond the beautiful cytology and transcriptional profiling, the authors utilized elegant cell enrichment experiments to enrich mast cells by recombinant stem cell factor, or to enrich neutrophils by recombinant colony-stimulating factor-3, and examined respective infection outcomes in Xenopus.

      Strengths:

      Through the use of recombinant IL4, the authors were able to test and eliminate the hypothesis that mast cell production of IL4 was the mechanism of host protection from Bd infection. Instead, impacts on the mucus glands and interaction with the skin microbiome are implicated as the protective mechanism. These results will press disease ecologists to examine the relative importance of this immune defense among species, the influence of mast cells on the skin microbiome and mucosal function, and open the potential for modulating mucosal defense.

      We thank the reviewer for recognizing the utility of the work presented in our manuscript.

      Weaknesses:

      A reduction of bacterial diversity upon infection, as described at the end of the results section, may not always be an "adverse effect," particularly given that anti-Bd function of the microbiome increased. Some authors (see Letourneau et al. 2022 ISME, or Woodhams et al. 2023 DCI) consider these short-term alterations as encoding ecological memory, such that continued exposure to a pathogen would encounter an enriched microbial defense. Regardless, mast cell-initiated protection of the mucus layer may negate the need for this microbial memory defense.

      We thank the reviewer their insightful comment. We have revised our discussion to include this notion.

      While the description of the mast cell location in the epidermal skin layer in amphibians is novel, it is not known how representative these results are across species ranging in chytridiomycosis susceptibility. No management applications are provided such as methods to increase this defense without the use of recombinant stem cell factor, and more discussion is needed on how the mast cell component (abundance, distribution in the skin) of the epidermis develops or is regulated.

      We thank the reviewer for this suggestion. Please find that we have added a paragraph to our revised manuscripts to address possible source(s) of skin mast cells and a statement acknowledging that greater understanding of mast cell biology across distinct amphibian species may be used to develop future strategies for management of amphibian diseases.

      We are very thankful to the reviewer for this excellent suggestion but would like to point out that the work presented in our manuscript was driven by comparative immunology questions more than by conservation biology. As such and considering just how little is known about mast cells outside of mammals; we chose not to speculate too much into possible utilities of altering amphibian skin mast cell composition and instead to focus our discussion on the immediate takeaways of the work presented by our paper.

      References

      (1) Flajnik, M.F. A cold-blooded view of adaptive immunity. Nat Rev Immunol 18, 438-453 (2018).

      (2) Mulero, I., Sepulcre, M.P., Meseguer, J., Garcia-Ayala, A. & Mulero, V. Histamine is stored in mast cells of most evolutionarily advanced fish and regulates the fish inflammatory response. Proc Natl Acad Sci U S A 104, 19434-19439 (2007).

      (3) Reite, O.B. A phylogenetical approach to the functional significance of tissue mast cell histamine. Nature 206, 1334-1336 (1965).

      (4) Reite, O.B. Comparative physiology of histamine. Physiol Rev 52, 778-819 (1972).

      (5) Takaya, K., Fujita, T. & Endo, K. Mast cells free of histamine in Rana catasbiana. Nature 215, 776-777 (1967).

      (6) Galli, S.J. New insights into "the riddle of the mast cells": microenvironmental regulation of mast cell development and phenotypic heterogeneity. Lab Invest 62, 5-33 (1990).

      (7) Babina, M., Guhl, S., Artuc, M. & Zuberbier, T. IL-4 and human skin mast cells revisited: reinforcement of a pro-allergic phenotype upon prolonged exposure. Archives of dermatological research 308, 665-670 (2016).

      (8) Hermans, M.A.W. et al. Human Mast Cell Line HMC1 Expresses Functional Mas-Related G-Protein Coupled Receptor 2. Front Immunol 12, 625284 (2021).

      (9) Buchan, K.D. et al. A transgenic zebrafish line for in vivo visualisation of neutrophil myeloperoxidase. PLoS One 14, e0215592 (2019).

      (10) Aponte-Lopez, A., Enciso, J., Munoz-Cruz, S. & Fuentes-Panana, E.M. An In Vitro Model of Mast Cell Recruitment and Activation by Breast Cancer Cells Supports Anti-Tumoral Responses. Int J Mol Sci 21 (2020).

      (11) Jamur, M.C. et al. Mast cell repopulation of the peritoneal cavity: contribution of mast cell progenitors versus bone marrow derived committed mast cell precursors. BMC Immunol 11, 32 (2010).

      (12) Walke, J.B. & Belden, L.K. Harnessing the Microbiome to Prevent Fungal Infections: Lessons from Amphibians. PLoS Pathog 12, e1005796 (2016).

      Reviewer #2: (Recommendations For The Authors): 

      We thank the reviewer for their excellent suggestions, their time reviewing this work and their help with this manuscript.

      While we were not able to incorporate some of these changes, please find that we have significantly altered our manuscript in accordance with the reviewer’s suggestions from their public review. We feel that we have substantially altered our paper, including providing considerable additional data, supporting the key findings therein.

      (1) The heatmap in Figure 1I appears to be scaled data, similar to Figure 4A, in which case the indicated scale numbers are not correct (e.g. they should be -2 to 2, or -3 to 3) 

      Thank you for the suggestion. Please find that we have changed this figure accordingly.

      (2) For Figure 1, additional curated gene lists might better illustrate the difference in cell types, e.g. include the data for a panel of mast cell genes in a heatmap (mcpt1, tpsab1, etc.) and another panel of curated neutrophil genes (e.g. lyz) in a heatmap. If the authors still have leftover RNA, qPCR verification of some of the critical genes (e.g. kit) would add to the rigor of the analysis, as this study is the foundation of a new method for culturing amphibian mast cells. 

      We thank the reviewer for this suggestion. Unfortunately, we do not have leftover RNA/cDNA and we have not been able to locate mcpt1 or tpsab1 in our DEGs. We anticipate that this issue may stem from the suboptimal annotation of the Xenopus laevis genome. We agree that curating more mast cell/neutrophil genes would be ideal but feel that we have adequately highlighted those genes that are differentially expressed between the two populations in our analysis.

      (3) The presentation of counts in Figure 2 is a bit hard to interpret. Although it is mentioned that everything is statistically significant, explicitly showing statistics for each gene would be better. One possibility would be to use a volcano plot (p-value vs log2 fold change) and highlight the genes shown in Figure 2, potentially with an accompanying heat map to show replicate variability. 

      We thank the reviewer for this suggestion. We entertained presenting the data as volcano plots or heat maps, but in the end felt that the bar graphs better conveyed the information that we are hoping to get across. Please note that the error bars in the bar graph depict the replicate variability. Please also note that to highlight that all the depicted genes were differentially expressed, we italicized the statement in the corresponding figure legend: “All depicted genes were significantly differentially expressed between the two populations”.

      (4) Narratively, it might make more sense to put Figure 4A-C with Figure 3. 

      We thank the reviewer for this suggestion. Please find that we significantly revised most of our figures to better convey the content therein. We combined the content of Figure 4A-C with Figure 5A-C and added data on epidermal thickness under different conditions into this figure; Figure 5 of our revised manuscript.

      (5) If possible, complementing the skin RNA-seq from rSCF treatment in Bd infection with skin RNA-seq from rCSF3 treatment to compare effects on transcriptional programs of barrier function, etc would elevate this study and add additional insights into cutaneous inflammation in the setting of Bd infection. 

      We thank the reviewer for this suggestion. We anticipate that the skin inflammation caused by Bd infection is not due solely to neutrophil infiltration and artificially altering the frog skin neutrophil content would thus not recapitulate chytridiomycosis progression. We completely agree that it would be valuable to examine barrier functions in control and mast cell-enriched, Bd-infected frogs. This is something that we hope to pursue further in future studies but feel that together with our additional findings, we are presenting a significant amount of data to constitute a stand-alone story.

      (6) In Figure S1A, analyzing only 3 AMP genes by qPCR is perhaps too focused. As a control, it would be useful to also test some genes known to be functionally important in neutrophil anti-microbial responses, e.g. lyz. Expanding on this experiment by performing RNA-seq on Bd-treated, bone-marrow-derived mast cells and neutrophils would be a great addition to the manuscript and an important resource for future studies in the field. The fact that the use of rSCF (or rCSF3) enables the differentiation of these cells in large numbers of pure populations presents this unique opportunity. Although IL-4 did not end up affecting mucus production, clues to the mediator(s) of this mast cell-dependent effect may be found with unbiased RNA-seq after exposure to Bd. 

      We thank the reviewer for this suggestion but would like to point out that our manuscript is focused on mast cells rather than neutrophils. We also believe that in vitro exposure of leukocytes to Bd is not the most physiologically relevant model of what would happen to skin-resident and incoming immune cell subsets, since Bd primarily infects top-most keratinocytes. We anticipate that rather than coming into direct contact with the fungus, cells like mast cells and neutrophils are responding to Bd-produced and infected cell-produced products. For this reason, we did not perform RNA-seq analysis of in vitro derived mast cells or neutrophils stimulated with Bd. As we develop more X. laevis-specific reagents, we hope to revisit the question of infected skin mast cell and neutrophil gene expression profiles but are not in a position to ask these questions at this time.

      This work is also guided by a finite budget, and we feel that together with our significant additional findings described in our revised manuscript, we are presenting a substantial amount of work to constitute a stand-alone story and manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      The following are minor edits needed in the text and figure legends: 

      Standardize terms such as IL4 instead of il4 or ril4 vs rIL4 throughout. Also, r-SCF vs rSCF. 

      Thank you. Please find that we have standardized such terms throughout our revised manuscript. Please note that we are adhering to the convention that gene names are in lower case, protein names are in upper case and recombinant protein names are preceded by an ‘r’.

      Pg 9 Change "In contract" to "In contrast". 

      Thank you and changed accordingly.

      Fig 4 - Perhaps indicate if results in addition to 7dpi are also available. 

      Please find that we analyzed Bd loads in control and mast cell-enriched, infected frogs after 10 dpi. This data is presented in Figures 3 and 4 of our revised manuscript.

      Similarly in Fig. 5, are results other than 10dpi available in the supplement? 

      Please find that the results from the microbiome studies are presented in supplemental figure 3 (Fig. S3). Please note that the results presented in original manuscript Fig. 5A-C - revised manuscript Fig. 5B-E depict data for 21 dpi, which is the longest examined infection timepoint. We present data from 1 and 10 dpi in Fig. 4 of our revised manuscript.

      Indicate why these days were chosen in the methods. 

      Please find that we indicated why the experimental timepoints were chosen, in the methods section of our revised manuscript.

      Fig S1 legend has errors in describing which panels are for which asterisks. 

      Fig. S3 legend indicates panels F and G. 

      Thank you. Please find that we revised our supplemental figures and amended the corresponding figure legends.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study entitled "Rifampicin tolerance and growth fitness among isoniazid-resistant clinical Mycobacterium tuberculosis isolates: an in-vitro longitudinal study" by Vijay et al. provides valuable insights into the association of rifampicin tolerance and growth fitness with isoniazid resistance among clinical isolates of M. tuberculosis. Antibiotic tolerance in M. tuberculosis is an important topic since it contributes to the lengthy and complicated treatment required to cure tuberculosis disease and may portend the emergence of antibiotic resistance. The authors found that rifampicin tolerance was correlated with bacterial growth, rifampicin minimum inhibitory concentrations, and isoniazid-resistance mutations.

      Strengths:

      The large number of clinical isolates evaluated and their longitudinal nature during treatment for TB (including exposure to rifampin) are strengths of the study.

      Weaknesses:

      Some of the methodologies are not well explained or justified and the association of antibiotic tolerance with growth rate is not a novel finding. In addition, the molecular mechanisms underlying rifampicin tolerance only in rapidly growing isoniazid-resistant isolates have not been elucidated and the potential implications of these findings for clinical management are not immediately apparent.

      We thank the reviewer for the comments, we have modified the method section and figure 1 to clarify the method as suggested by the reviewer.

      Although we agree that previous studies have shown the association of slow growth rate with antibiotic tolerance, ours is the most comprehensive assessment of rifampicin tolerance among clinical isolates, to our knowledge. In particular, we show that the degree of tolerance in clinical isolates can vary over several orders of magnitude: which had not been previously documented or appreciated. Furthermore, the association of high tolerance among IR isolates is a new finding, and given the potential for tolerance to increase risk of de novo drug resistance, our study suggests that IR isolates with high rifampicin tolerance may present a risk for development of MDR-TB.

      In addition, we have also analysed the longitudinal isolates and the genetic variants emerging in them associated with increase in rifampicin tolerance. This analysis reveals possible multiple pathways to increase in rifampicin tolerance among clinical M. tuberculosis isolates. Possible clinical implication includes associating high rifampicin tolerance and isoniazid resistance as a risk factor for tuberculosis treatment failure. This study helps to develop further clinical studies to evaluate the role of rifampicin tolerance in IR isolates and treatment outcome. We have focused on these aspects in the discussion of the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study by Vijay and colleagues addresses a clinically important, and often overlooked aspect of Tb treatment. Detecting for variations in the level of antibiotic tolerance amongst otherwise antibiotic-susceptible isolates is difficult to routinely screen for, and consequently not performed. The authors, present a convincing argument that indeed, there is significant variation in the susceptibility of isoniazid-resistant strains to killing by rifampicin, in some cases at the same tolerance levels as bona fide resistant strains. On the whole, the study is easy to follow and the results are justified. This work should be of interest to the wider TB community at both a clinical and basic level.

      Weaknesses:

      The manuscript is long, repetitive in places, and the figures could use some amending to improve clarity (this could be a me-specific issue as they look ok on my screen, yet the colour is poor when printed).

      We thank the reviewer for the comments, we have modified the revised manuscript as per the reviewer suggestions.

      It would have been great to have seen some correlation between increased rifampicin tolerance and treatment outcome, although I'm not sure if this data is available to the researchers. I agree with the researchers the use of a single media condition is a limitation. However, this is true of a lot of studies. Rifampicin tolerance and treatment outcome analysis.

      We agree with the reviewer that correlation between rifampicin tolerance and treatment outcome is important. This needs to be performed in future studies with better design to correlate rifampicin tolerance with treatment progression or outcome data.  

      Reviewer #3 (Public Review):

      Summary:

      The authors have initiated studies to understand the molecular mechanisms underlying the devolvement of multi-drug resistance in clinical Mtb strains. They demonstrate the association of isoniazid-resistant isolates by rifampicin treatment supporting the idea that selection of MDR is a microenvironment phenomenon and involves a group of isolates.

      Strengths:

      The methods used in this study are robust and the results support the authors' claims to a major extent.

      Weaknesses:

      The manuscript needs a thorough vetting of the language. At present, the language makes it very difficult to comprehend the methodology and results.

      We thank the reviewer for the comments, we have revised the manuscript as per the reviewer’s suggestions.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Methods: The authors attempt to differentiate between "fast"- and "slow"-growing bacteria in order to determine if the growth rate is associated with rifampicin tolerance. This is accomplished by assessing growth on solid agar at 15 and 60 days post-incubation, respectively. However, mycobacterial growth rate is not a binary phenomenon but rather a continuous variable. Moreover, it is not clear why 15 and 60 days were selected. Also, instead of a "slow growth" phenotype, the 60-day time point might simply reflect a longer lag phase. Were the plates examined at any interval time points? It would be interesting to know whether colony growth was delayed overall in the populations observed only at 60 days, or simply if the appearance of microcolonies visible to the naked eye was delayed (with normal growth afterwards).

      We thank the reviewer for the comments, we want to clarify that we have not used agar plates but most-probable number method to determine the survival fraction post antibiotic treatment. We have clarified this in the revised manuscript and revised figure 1. The MPN method is a binary measure (growth/ no growth) and therefore cannot differentiate between long lag time and other mechanisms. In our original analysis, we included an intermediate time point of 30 days, but these data (included as supp fig. 1) cannot address the issue of lag phase directly. Since the 30-day time point did not add to the overall analysis and interpretation, we had not included them in the original submission.

      (2) Methods/Results/Discussion: Some important clinical information is missing-how were the patients treated who had IR isolates? Did they receive the standard regimen for DS TB or was another drug substituted for isoniazid? Exposure to different drugs could affect the rifampicin-tolerant populations during the intensive phase (Figure 5).

      Thank you for this comment, we have included the information regarding the treatment regimen in the revised manuscript.

      Were there differences in microbiological (sputum culture conversion rate at 8 weeks or time to culture negativity) or clinical outcomes based on isoniazid susceptibility? Perhaps more importantly, were there differences in microbiological/clinical outcomes based on the proportion of bacterial subpopulations with rifampicin tolerance for a particular isolate? There should be more discussion on the potential clinical implications of the study's findings.

      We agree with the reviewer that correlation between rifampicin tolerance and treatment progression or outcome is important. This needs to be performed in future studies with better design to correlate rifampicin tolerance with treatment progression or outcome data.  

      (3) Results (Figure 3A): Although an interesting finding, the increased rifampicin tolerance observed only in the "rapidly" growing populations of isoniazid-resistant isolates (IR) vs. isoniazid-susceptible (IS) isolates is not explained. In contrast, equally, increased rifampicin tolerance is seen in the "slowly" growing populations of both IR and IS isolates. It would be interesting to know if these slowly growing populations show specific tolerance to rifampicin or if, as expected, slow growth confers tolerance to a range of different bactericidal antibiotics.

      We thank the reviewer for the suggestions. we agree these will be interesting to investigate in a future study but are outside the scope of the current study.

      (4) Results (Figure 3B): The basis for the classification into tertiles is not clear and appears somewhat arbitrary-does this represent the survival of a particular isolate following rifampicin exposure relative to the other isolates based on isoniazid susceptibility (IS or IR) or the % growth relative to other populations for the same isolate? Figure 3B is missing a y-axis label. Is it a log10 MPN ratio?

      We thank the reviewer for pointing this, we want to clarify that for the classification into tertiles, first we pooled both group of isolates isoniazid susceptible (IS) and isoniazid resistant (IR) into a single population. Subsequently, we categorized this unified population into three distinct groups: low, medium, and high, based on their survival fraction following rifampicin treatment. Consequently, the 'low,' 'medium,' and 'high' tertiles represent the survival of each isolate following rifampicin exposure relative to the total number of isolates  combing both IS and IR isolates.

      For clarity, we provide a breakdown of the criteria for each tertile:

      +Low tertile: Consists of isolates with the lowest survival fraction (bottom 25%).

      +Medium tertile: Encompasses isolates with survival fractions that fall between the bottom 25% and the top 25%.

      +High tertile: Comprises isolates with the highest survival fractions (top 25%). This we have modified in the revised manuscript to clarify.

      We have also modified the Figure 3B to correct the y-axis label.

      (5) Results (lines 185-186): For correlating relative growth in the absence of antibiotics, 19 clinical isolates "outliers" were removed without explanation.

      We have added explanation for the “outliers” which were removed earlier due to deviation from normal distribution, we have also provided the supplementary figure 3 which includes these outliers.

      (6) Results (lines 203-211): The authors attempted to investigate a potential association between the mechanism of M. tuberculosis isoniazid resistance and the degree of rifampicin tolerance. However, the vast majority of IR clinical isolates (n=71) had a katG_S315X mutation and only 8 isolates had alternative mutations (inhA_I21T and fabG1_C-15X). Given the wide range of rifampicin tolerance observed within these isoniazid-resistant isolates, they concluded that other genetic or epigenetic determinants must be playing a role. WGS of longitudinally collected isolates from the same patients during TB treatment yielded non-synonymous SNPs in a list of genes previously reported to be associated with persistence, tolerance, and mycobacterial survival. However, precise mechanisms (including, e.g., expression of efflux pumps) are not investigated.

      We thank the reviewer for summarising the findings. Yes, we agree that investigating the precise mechanism of rifampicin tolerance is beyond the scope of the current work.

      Minor comments:

      (1) Abstract (line 41): The nonstandard abbreviations "IR" and "IS" have not been introduced prior to this usage.

      We have modified this in the abstract.

      (2) Introduction (line 60): Insert "phenomena" or "mechanisms" after "two".

      We have modified this in the introduction.

      (3) Introduction (lines 66-69): This sentence is confusing, especially the second part ("supporting this studies...").

      We have modified the lines to clarify.

      (4) Introduction (line 84): In the current text, it appears as if "IR" is the abbreviation for "isoniazid". Therefore, I recommend changing "resistance to isoniazid" to "isoniazid resistance".

      We have modified this in the revised manuscript.

      (5) Results (line 141): Insert "the" before "rest".

      We have modified this in the revised manuscript.

      (6) Results (line 187): Replace "did not had" with "did not have".

      We have modified this in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Abstract:

      The abstract is long and repetitive. It needs reworking and shortening to improve clarity and highlight the main takeaway message.

      We thanks the reviewer for the suggestions and have modified this in the revised manuscript.

      The introduction is interesting and contains relevant information. However, it is long and takes a while to get to the point of the study. It needs re-writing to emphasise key prior results and the purpose of this study.

      We thanks the reviewer for the suggestions and we have modified this in the revised manuscript.

      Results:

      As the study relies predominately on the use of MPN, I think a simple schematic of how the experiment is performed would be informative. Could this be added to Figure 1?

      We have revised the figure 1 in the manuscript to include the schematic representation.

      Some of the differences in MKD90, whilst they may be significant, are small so it would at least provide context as to the relevance of these differences. This may also alleviate my confusion as to how the authors can measure the time required to achieve MDK90 as 1.23-1.31 days when the first time point that is taken is day 2 (the data in Figure 2). They have FigS6 but this is small and hard to follow.

      We thank the reviewer for this suggestion, we have modified this in the revised manuscript and figureS6.

      Figure 2:

      Would be helpful to have -1 on the Y axis.

      The grey dots don't print very well (Might be my printer)

      We have modified this in the revised manuscript, figure 2.

      Line 142: The authors note a difference in RIF tolerance at day 15 that disappeared by day 60. I assume they are referring to the day 5 timepoint although this isn't clear as written.

      Yes, it is referring to the day 5 time point and we have clarified this in the revised manuscript.

      The section starting at line 148 (fig 3) is interesting, but it is difficult to read and follow what the difference is between this data and the prior data in Figure 2. It also wasn't until about line 165 that the purpose became clear. Overall the conclusions are sound and interesting.

      We have modified this in the revised manuscript.

      Line 154: What are the early and late time recovery time points?

      Is Figure 3A the same data as Figure 2?

      We have clarified this in the revised manuscript, the figure 3A is the same data as Figure 2.

      I found Figure 6 hard to follow. I'm not sure how better to present this data, but it should be improved. Some further clarification in the text would be helpful.

      We thank the reviewer for the suggestions. We have added more explanation in the text to clarify figure 6.

      Conclusions:

      The conclusions are sound, based on the data presented. The clinical relevance is highlighted, yet appropriately phrased to not be too far-reaching.

      Again, I think the conclusions could be condensed considerably. It is repetitive in places, which distills the main outcomes of this otherwise interesting and important study. The authors appropriately highlight some of the limitations of their study.

      We thank the reviewer for these comments and have modified this in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript "Rifampicin tolerance and growth fitness among isoniazid-resistant clinical Mycobacterium tuberculosis isolates: an in-vitro longitudinal study" by Srinivasan et.al., details the identification/ development of isoniazid-resistant strains in clinical isolates following testament with rifampicin. This is an important aspect of understanding MDR development in TB strains. the results are promising and gel well with the hypothesis. However, the manuscript requires a thorough language modification. While the overall idea is clear the methodology does not come out clearly.

      Specific comments:

      (1) It is not clear whether rifampicin treatments were given for 2 and 5 days before kill curves or for 15 and 60 days? The methodology needs to be phased clearly. Why was this time interval of 15 days and 60 days taken? is there a rationale for this?

      We thank the reviewer for the suggestions, we have modified the method and figure 1 to clarify this in the revised manuscript.

      (2) A concentration of 2ug/ml was used for in vitro culture in this study. While the authors themselves indicate that this is well above the MIC, this might represent a non- natural dose and hence may force the evolution of strains. What will be the scenario in the natural course of antibiotic treatment (dose at MIC or less than MIC)?

      We have observed that till 5 days there is no significant resistant emergence but after 5 days only resistance emerges, therefore we avoided determining the survival fraction after resistance emergence, the kill curve represents mostly tolerant sub population. ADD: Pharmacokinetic studies of rifampicin dosing suggest that peak concentrations of >2-32 µg/mL are typical for standard doses of the drug, therefore we believe the chosen concentration of 2 µg/mL to be physiologically relevant.

      (3) As described in line 155, the survival spanned a broad distribution, across a million times in difference. This is rather surprising that 5 days of rifampicin treatment would lead to such a spread in resistance patterns. Did the authors study the different populations to understand this phenomenon? This is important given the scale of resistance developed in this short time.

      We want to clarify that the broad range of survival fraction reflect the difference in tolerant sub-populations but not resistant sub-population to rifampicin as they are determined post rifampicin treatment in rifampicin free media, this has been clarified in the revised figure 1.

      Overall, the manuscript is a detailed study with new insights into the development of multi-drug resistance by Mtb. A thorough vetting for language is essential for a greater impact of the study.

      We thank the reviewer and have attempted to improve the clarity of the language to increase the potential impact of our findings.

    1. Author response:

      The following is the authors' response to the current reviews.

      Reviewer #1 (Public Review):

      I'll begin by summarizing what I understand from the results presented, and where relevant how my understanding seems to differ from the authors' claims. I'll then make specific comments with respect to points raised in my previous review (below), using the same numbering. Because this is a revision I'll try to restrict comments here to the changes made, which provide some clarification, but leave many issues incompletely addressed.

      As I understand it the main new result here is that certain recurrent network architectures promote emergence of coordinated grid firing patterns in a model previously introduced by Kropff and Treves (Hippocampus, 2008). The previous work very nicely showed that single neurons that receive stable spatial input could 'learn' to generate grid representations by combining a plasticity rule with firing rate adaptation. The previous study also showed that when multiple neurons were synaptically connected their grid representations could develop a shared orientation, although with the recurrent connectivity previously used this substantially reduced the grid scores of many of the neurons. The advance here is to show that if the initial recurrent connectivity is consistent with that of a line attractor then the network does a much better job of establishing grid firing patterns with shared orientation.

      Beyond this point, things become potentially confusing. As I understand it now, the important influence of the recurrent dynamics is in establishing the shared orientation and not in its online generation. This is clear from Figure S3, but not from an initial read of the abstract or main text. This result is consistent with Kropff and Treves' initial suggestion that 'a strong collateral connection... from neuron A to neuron B... favors the two neurons to have close-by fields... Summing all possible contributions would result in a field for neuron B that is a ring around the field of neuron A.' This should be the case for the recurrent connections now considered, but the evidence provided doesn't convincingly show that attractor dynamics of the circuit are a necessary condition for this to arise. My general suggestion for the authors is to remove these kind of claims and to keep their interpretations more closely aligned with what the results show.

      We would like to clarify that the simple (flexible) attractor is a weaker condition than the ones previously used to align grid cells. However, by no means we claim that it is a necessary condition for grid maps to align. Other architectures, certainly more complex ones but perhaps even simpler ones, can align grid maps in our model.

      Major (numbered according to previous review)

      (1) Does the network maintain attractor dynamics after training? Results now show that 'in a trained network without feedforward Hebbian learning the removal of recurrent collaterals results in a slight increase in gridness and spacing'. This clearly implies that the recurrent collaterals are not required for online generation of the grid patterns. This point needs to be abundantly clear in the abstract and main text so the reader can appreciate that the recurrent dynamics are important specifically during learning.

      We respectfully disagree with the interpretation of this result. In this model cells self-organize to produce aligned grid maps. In such systems it makes sense to characterize the equilibrium states of the system. We turned learning off in Figure S3 to show that the recurrent connections have a contractive effect on grid spacing. But artificially turning off learning means that one can no longer make claims about the equilibrium states of the system, since it can no longer evolve freely. In a functional network, if the recurrent attractor is removed, the system will evolve towards poor gridness and no alignment no matter what the starting point is, as also shown in Figure S3. Several experimental results invite us to think of grid cells as the equilibrium solution of a series of constraints that is ready to change at any time: Barry et al, 2012; Yoon et al, 2013; Carpenter et al, 2015; Krupic et al, 2015; Krupic et al, 2018; Jayakumar et al, 2019.

      One point in which we perhaps agree with the reviewer is that information about the hexagonal maps is kept in the feedforward weights, while behavior and the recurrent collaterals act as constraints of which these feedforward weights are the equilibrium solution.

      (2) Additional controls for Figure 2 to test that it is connectivity rather than attractor dynamics (e.g. drawing weights from Gaussian or exponential distributions). The authors provide one additional control based on shuffling weights. However, this is far from exhaustive and it seems difficult on this basis to conclude that it is specifically the attractor dynamics that drive the emergence of coordinated grid firing.

      Again, we do not claim that this is the only way in which grid maps can be aligned, but it is the simplest one proposed so far. We were asked if it was the specific combination of input weights to a cell rather than the organization provided by the attractor which resulted in aligned maps. By shuffling the inputs to a cell we keep the combination of inputs invariant but lose the attractor architecture. Since grid maps in this new situation are not aligned, we can safely conclude that it is not the combination of inputs per se, but the specific organization of these inputs that allows grid alignment. It is not fully clear to us what ‘exhaustive’ means in this context.

      (3) What happens if recurrent connections are turned off? The new data clearly show that the recurrent connections are not required for online grid firing, but this is not clear from the abstract and is hard to appreciate from the main text.

      This point is related to (1). Absent this constraint, Figure S3 shows that the system evolves toward larger spacing, with poorer gridness and no alignment.

      (4) This is addressed, although the legend to Fig. S2D could provide an explanation / definition for the y-axis values.

      We have now added: Mean input fields are the sum of all inputs of a given kind entering a neuron at a given moment in time, averaged across cells and time.

      (5) Given the 2D structure of the network input it perhaps isn't surprising that the network generates 2D representations and this may have little to do with its 1D connectivity. The finding that the networks maintain coordinated grids when recurrent connections are switched off supports my initial concern and the authors explanation, to me at least, remain confusing. I think it would be helpful to consider that the connectivity is specifically important for establishing the coordinated grid firing, but that the online network does not require attractor dynamics to generate coordinated grid firing.

      This point is related to (1) and (3). We agree with the reviewer that the input lies within a 2D manifold, but this is not something that the network has to find out because it receives one datapoint of information at a time. This alone is not enough to form aligned grid cells, since each grid cell can find a roughly equivalent equilibrium in a different direction. It is only the constraint imposed by the recurrent collaterals that aligns grid maps, and, as we show, this constraint does not need to be constructed ad hoc to work on 2D, as previously thought. When recurrent connections are switched off, the system evolves toward unaligned grid maps, with larger spacing and lower gridness. Regarding the results obtained after modifying the network and turning off learning, we think they have a very limited scope (in this case showing the contractive effect of recurrent collaterals on grid spacing), given that the system is artificially being kept out of its natural equilibrium.

      (6) Clarity of the introduction. This is somewhat clearer, but I wonder if it would be hard for someone not familiar with the literature to accurately appreciate the key points.

      We have made our best effort to improve the clarity of the introduction.

      (7) Remapping. I'm not sure why this is ill posed. It seems the proposed model can not account for remapping results (e.g. Fyhn et al. 2007). Perhaps the authors could just clearly state this as a limitation of the model (or show that it can do this).

      We view our model as perfectly consistent with Fyhn et al, 2007. Remapping is not triggered by the network itself, though, but rather by a re-arrangement of the inputs requiring the network to learn new associations. Different simulations of the same model with identical parameters can be interpreted as remapping experiments.

      Reviewer #3 (Public Review):

      Summary:

      The paper proposes an alternative to the attractor hypothesis, as an explanation for the fact that grid cell population activity patterns (within a module) span a toroidal manifold. The proposal is based on a class of models that were extensively studied in the past, in which grid cells are driven by synaptic inputs from place cells in the hippocampus. The synapses are updated according to a Hebbian plasticity rule. Combined with an adaptation mechanism, this leads to patterning of the inputs from place cells to grid cells such that the spatial activity patterns are organized as an array of localized firing fields with hexagonal order. I refer to these models below as feedforward models.

      It has already been shown by Si, Kropff, and Treves in 2012 that recurrent connections between grid cells can lead to alignment of their spatial response patterns. This idea was revisited by Urdapilleta, Si, and Treves in 2017. Thus, it should already be clear that in such models, the population activity pattern spans a manifold with toroidal topology. The main new contributions in the present paper are (i) in considering a form of recurrent connectivity that was not directly addressed before. (ii) in applying topological analysis to simulations of the model. (iii) in interpreting the results as a potential explanation for the observations of Gardner et al.

      We wanted to note that we do not see this paper as proposing an alternative to the attractor hypothesis, given that we use attractor networks, but rather as an exploration of possibilities not yet visited by this hypothesis.

      Strengths:

      The exploration of learning in a feedforward model, when recurrent connectivity in the grid cell layer is structured in a ring topology, is interesting. The insight that this not only align the grid cells in a common direction but also creates a correspondence between their intrinsic coordinate (in terms of the ring-like recurrent connectivity) and their tuning on the torus is interesting as well, and the paper as a whole may influence future theoretical thinking on the mechanisms giving rise to the properties of grid cells.

      Weaknesses:

      (1) In Si, Kropff and Treves (2012) recurrent connectivity was dependent on the head direction tuning, in addition to the location on a 2d plane, and therefore involved a ring structure. Urdapilleta, Si, and Treves considered connectivity that depends on the distance on a 2d plane. The novelty here is that the initial connectivity is structured uniquely according to latent coordinates residing on a ring.

      The recurrent architectures in the cited works are complex and require arranging cells in a 2D manifold to calculate connectivity based on their relative 2D position. In other words, the 2D structure is imprinted in the architecture, as in our 2D condition. In this work the network is much simpler and only requires neighboring relations in 1D. Such relationships have been shown to spontaneously emerge in the hippocampal formation (Pastalkova et al, 2008; Gonzalo Cogno et al, 2024).

      (2) The paper refers to the initial connectivity within the grid cell layer as one that produces an attractor. However, it is not shown that this connectivity, on its own, indeed sustains persistent attractor states. Furthermore, it is not clear whether this is even necessary to obtain the results of the model. It seems possible that (possibly weaker) connections with ring topology, that do not produce attractor dynamics but induce correlations between neurons with similar locations on the ring would be sufficient to align the spatial response patterns during the learning of feedforward weights.

      Regarding the first part of the comment, the recurrent collaterals create one or at times multiple bumps of activity in the network so that neighboring (interconnected) cells activate together. An initial random state of activity rapidly falls into this dynamic, constrained by the attractor. To us this is not surprising given that this connectivity is the classical means of creating a continuous attractor. Perhaps there is some deeper meaning in this comment that we are not fully grasping.

      Regarding the second part of the comment, we fully agree with the reviewer. We are presenting what so far is the simplest connectivity that can align grid maps, but by no means we claim that it is the simplest possible one. Regarding weaker connections with ring topology, we show in Figure S2 that a ring attractor with too weak or too strong connections is incapable of aligning grids, since a balance between feedforward and feedback inputs is required.

      (3) Given that all the grid cells are driven by an input from place cells that span a 2d manifold, and that the activity in the grid cell network settles on a steady state which is uniquely determined by the inputs, it is expected that the manifold of activity states in the grid cell layer, corresponding to inputs that locally span a 2d surface, would also locally span a 2d plane. The result is not surprising. My understanding is that this result is derived as a prerequisite for the topological analysis, and it is therefore quite technical.

      We understand that the reviewer is referring to the motivation behind studying local dimensionality. We agree that the topological analysis approach is quite technical, but it provides unique insights. The theorem of closed surfaces, which allows us to deduce a toroidal topology from Betti numbers (1,2,1), only applies to closed surfaces. One thus needs to show that the point cloud is a surface (local dimensionality of 2) and is closed (no borders or singularities). If borders or singularities were present, a toroidal topology could not be claimed from these Betti numbers. Thus, it is a crucial step of the analysis.

      (4) The modeling is all done in planar 2d environments, where the feedforward learning mechanism promotes the emergence of a hexagonal pattern in the single neuron tuning curve. Under the scenario in which grid cell responses are aligned (i.e. all neurons develop spatial patterns with the same spacing and orientation) it is already quite clear, even without any topological analysis that the emerging topology of the population activity is a torus.

      However, the toroidal topology of grid cells in reality has been observed by Gardner et al also in the wagon wheel environment, in sleep, and close to boundaries (whereas here the analysis is restricted to the a sub-region of the environment, far away from the walls). There is substantial evidence based on pairwise correlations that it persists also in various other situations, in which the spatial response pattern is not a hexagonal firing pattern. It is not clear that the mechanism proposed in the present paper would generate toroidal topology of the population activity in more complex environments. In fact, it seems likely that it will not do so, and this is not explored in the manuscript.

      We agree that our work was constrained to exploration in 2D and that the situations posed by the reviewer are challenging, but we do not see them as unsurmountable. The wagon wheel shows a preservation of toroidal topology locally, where the behavior of the animal is rather 2-dimensional. Globally, hexagonal maps are lost, which is compatible with some flexibility in the way grid maps are formed. If sleep meant that all inputs are turned off, our model would predict a dynamic dictated by the architecture (1D for the ring attractor, for example), but we do not really know that this is the case. In the future, we intend to explore predictive activity along the linear attractor, which could both result in path integration and in some level of preservation of the activity when inputs are completely turned off.

      Regarding boundaries, as we have argued before, the cited work chooses to filter away what looks like more than half of the overall explained variance through PCA, and this is only before applying a non-linear dimensionality reduction algorithm. It is specifically shown that the analyzed components are the ones with global periodicity throughout the environment. Thus, it is conceivable that through this approach, local irregularities found only at the borders are disregarded in favor of a clearer global picture. While using a different methodology, our approach follows a similar spirit, albeit with far less noisy data.

      (5) Moreover, the recent work of Gardner et al. demonstrated much more than the preservation of the topology in the different environments and in sleep: the toroidal tuning curves of individual neurons remained the same in different environments. Previous works, that analyzed pairwise correlations under hippocampal inactivation and various other manipulations, also pointed towards the same conclusion. Thus, the same population activity patterns are expressed in many different conditions. In the present model, this preservation across environments is not expected. Moreover, the results of Figure 6 suggest that even across distinct rectangular environments, toroidal tuning curves will not be preserved, because there are multiple possible arrangements of the phases on the torus which emerge in different simulations.

      We agree with this observation. A symmetry in our implementation results in the fact that only ~50% of times the system falls in the preferred solution, and the rest of the times it falls into other local minima. Whether this result is at odds with current observations can be debated on the basis of probabilities. However, we believe that the symmetry we found is purely circumstantial, and that it can be broken by elements such as head direction modulation or other ingredients used to achieve path integration. In other words, we acknowledge that symmetry is an issue of the implementation we show here (which has been kept as simple as possible to serve as a proof-of-principle) but we do not think that it is a defining feature of flexible attractors in general. We expect that future implementations that incorporate path integration capabilities will not present this kind of symmetry in the space of solutions.

      Regarding the rigid phase translation across modalities, while this effect is very clear in Gardner et al, it is less so in other datasets. The analyses shown in Hermansen et al (2024) can rather be interpreted as somewhere in the way between perfect rigid translation and fully randomized phases across navigation modalities.

      (6) In real grid cells, there is a dense and fairly uniform representation of all phases (see the toroidal tuning of grid cells measured by Gardner et al). Thus, the highly clustered phases obtained in the model (Fig. S1) seem incompatible with the experimental reality. I suspect that this may be related to the difficulty in identifying the topology of a torus in persistent homology analysis based on the transpose of the matrix M.

      We partly agree with this observation and note that a pattern of ordered phases is an issue not only for the 1D attractor but also for the 2D one, which appears much more uniform than in experimental data. The low number of neurons we used for computational economy and the full connectivity could be key ingredients to generate these phase patterns. To show that this is not a defining feature of flexible attractors, apart from the fact that these patterns appear also with non-flexible 2D architectures, we included in Figure S1 simulations with ‘fragmented 1D’ architectures. In this case the architecture is a superposition of 20 random 1D stripe-like attractors. While the alignment of maps achieved with this architecture is almost at the same level as the one obtained with 1D and 2D attractors, the phases are much more similar to what has been observed experimentally, and less uniform than what is obtained with 2D attractors.

      (7) The motivations stated in the introduction came across to me as weak. As now acknolwledged in the manuscript, attractor models can be fully compatible with distortions of the hexagonal spatial response patterns - they become incompatible with this spatial distortions only if one adopts a highly naive and implausible hypothesis that the attractor state is updated only by path integration. While attractor models are compatible with distortions of the spatial response pattern, it is very difficult to explain why the population activity patterns are tightly preserved across multiple conditions without a rigid two-dimentional attractor structure. This strong prediction of attractor models withstood many experimental tests - in fact, I am not aware of any data set where substantial distortions of the toroidal activity manifold were observed, despite many attempts to challenge the model. This is the main motivation for attractor models. The present model does not explain these features, yet it also does not directly offer an explanation for distortions in the spatial response pattern.

      Some interesting examples are experiments in 3D, where grid cells presumably communicate with each other through the same recurrent collaterals, but global periodicity is lost and only some local order is preserved even away from boundaries (Ginosar et al, 2021; Grieves et al, 2021). While these datasets have not been explored using topological analysis, they serve as strong motivators to understanding 2D grid cells as one equilibrium solution that arises under some set of constraints, but belongs to a wider space of possible solutions that may arise as well under more flexible constraints. Even (and especially) if one adheres to the hypothesis that grid cells are pre-wired into a 2D torus, a concept like flexible attractors might become useful to understand how their activity is rendered in 3D. Another strong motivation is our lack of understanding of how a perfectly balanced 2D structure is formed and maintained. Simpler architectures could be thought of as alternatives, but also as an intermediate step towards it.

      Regarding the rigid phase translation across modalities, while this effect is very clear in Gardner et al, it is less so in other datasets. The analyses shown in Hermansen et al (2024) can rather be interpreted as somewhere in the way between perfect rigid translation and fully randomized phases.

      In a separate point, although it might not be strictly related to the comment, we do not fully share the idea that persistent activity patterns during sleep are necessary or sufficient conditions for attractor dynamics, although we do agree that attractors could be the mechanism behind them and any alternative is at least as complex as attractors. On the necessity side, attractors in the hippocampus are not constantly engaged (Wills et al, 2005). For sufficiency, one should prove that no other network is capable of reproducing the phenomenon, and to our best knowledge we are still far from that point.

      (8) There is also some weakness in the mathematical description of the dynamics. Mathematical equations are formulated in discrete time steps, without a clear interpretation in terms of biophysically relevant time scales. It appears that there are no terms in the dynamics associated with an intrinsic time scale of the neurons or the synapses (a leak time constant and/or synaptic time constants). I generally favor simple models without lots of complexity, yet within this style of modelling, the formulation adopted in this manuscript is unconventional, introducing a difficulty in interpreting synaptic weights as being weak or strong, and a difficulty in interpreting the model in the context of other studies.

      We chose to keep the model as simple as possible and in the line of previous publications developing it. However, we see the usefulness of putting it in what in the meantime has become a canonical framework. Fortunately this has been done by D’Albis and Kempter (2017). In our simplified version of the model there is no leak term and adaptation on its own brings down activity in the absence of input, but we agree that such a term could be added, albeit not without modifying all other network parameters.

      In my view, the weaknesses discussed above limit the ability of the model, as it stands, to offer a compelling explanation for the toroidal topology of grid cell population activity patterns, and especially the rigidity of the manifold across environments and behavioral states. Still, the work offers an interesting way of thinking on how the toroidal topology might emerge.

      Reviewer 1:

      Reviewer #1 (Recommendations For The Authors):

      See comments above. In addition:

      (1) Abstract: '...interconnected by a two-dimensional attractor guided by path integration'. This is unclear. I think the intended meaning might be along the lines of '...their being computed by a 2D continous attractor that performs path integration'?

      'path integration allowing for no deviations from the hexagonal pattern' This is incorrect. Local modulation of the gain of the speed input to a standard CAN would distort the grid pattern.

      'Using topological data analysis, we show that the resulting population activity is a sample of a torus' Activity in the model?

      'More generally, our results represent a proof of principle against the intuition that the architecture and the representation manifold of an attractor are topological objects of the same dimensionality, with implications to the study of attractor networks across the brain' I guess one might hold this intuition, but it strikes me as obvious that if you impose an sufficiently strong n-dimensional input on a network then it it's activity could have the same dimensionality. I don't really see this as being a point worth highlighting. Perhaps the more interesting point, it that during learning the recurrent connectivity aligns the grid fields of neurons in the network, and this may be a specific function of the 1D attractor dynamcis, although I don't think the authors have made this point convincing.

      'The flexibility of this low dimensional attractor allows it to negotiate the geometry of the representation manifold with the feedforward inputs'. See above for comments on the use of 'negotiate'.

      'while the ensemble of maps preserves features of the network architecture'. I don't understand this. What is the 'ensemble of maps' and what are the features referred to.

      We have reviewed the abstract considering these points. Regarding the ‘strong n-dimensional input’, we want to point out that it is not the input itself that generates a torus (the no attractor condition does not lead to a torus) but rather the interplay between the input and the attractor.

      ‘Perhaps the more interesting point …’, we do not fully understand how this sentence deviates from our own conclusions. We here show that a strong n-dimensional input is not enough to align grid cells (produce a n-torus), it is the interplay between inputs and attractor dynamics that does so, even if the attractor is not n-dimensional in terms of architecture.

      The ensemble of maps refers to the transpose of the population activity matrix, where each point in the cloud is a map, and the features refer to the persistent homology.

      (2) The manuscript still fails to clarify the difference between a model that path integrates in two dimensions and a model that simply represents information with a given dimensionality. The argument that it's surprising that a network with 1D architecture represents a higher dimensional input strikes me as incorrect and an unnecessary attempt to argue for conceptual importance. At least to me this isn't surprising. It would be surprising if the 1D network could path integrate but this doesn't seem to be the case.

      In response to the reviewer’s concerns, we have made clear in the introduction and discussion that this model has no path integration capabilities, although we aim to develop a model capable of path integration using the kind of simple architecture presented here. We want to highlight here that equating attractor dynamics with path integration would be a conceptual mistake.

      (3) Other wording also seems to make unnecessary conceptual claims. E.g. The repeated use of 'negotiate' implies some degree of intelligence, or at least an exchange of information, that isn't shown to exist. I wonder if more precise language could be used? As I understand it the dimensionality is bounded by the inputs on the one hand, and the network connectivity on the other, with the actual dimensionality being a function of the recurrent and feedforward synaptic weights. There's clearly some role for the relative weights and the properties of plasticity rules, but I don't see any evidence for a negotiation.

      An interesting observation in Figure S2 is that grid maps are aligned only if the relative strength of feedforward and recurrent inputs is similar. If one of them can impose over the other, grid maps do not align. This equilibrium can metaphorically be thought of as a negotiation instance, where the negotiation is an emergent property of the system rather than something happening at an individual synapse.


      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Reviewer #1 (Recommendations For The Authors):

      Major

      (1) What is the evidence that, after training, the 1D network maintains its attractor dynamics when feedforward inputs are active? If the claim is that it does then it's important to provide evidence, e.g. responses to perturbations, or other tests. The alternative is that after training the recurrent inputs are drowned out by the feed forward spatial inputs.

      We agree with the reviewer on the importance of this point. In our model, networks are always learning, and the population activity represented by aligned grid maps in a trained network is a dynamic equilibrium that emerges from the interplay between feedforward and collateral constraints. If Hebbian learning is turned off, one gets a snapshot of the network at that moment. We now show in Fig. S3 that in a trained network without feedforward Hebbian learning the removal of recurrent collaterals results in a slight increase in gridness and spacing. The expansion is due to the fact that, as we argue in the Results section, the attractor has a contractive effect on grid maps, which could relate to observations in novel environments (Barry et al, 2007). If Hebbian learning is turned on in the same situation, the maps, no longer constrained by the attractor, drift toward the equilibrium solution of the ‘No attractor’ condition, with significantly larger spacing, no alignment and lower individual gridness. Thus, the attractor is the force preventing them to do so when feedforward Hebbian learning is on.

      These observations point to the key role played by the attractor not only in forming but also in sustaining grid activity. The dynamic equilibrium framework fits well known properties of the system, such as its capacity to recalibrate very fast (Jayakumar et al, 2019), although this particular feature cannot be modeled with the current version of our model, that lacks path integration capabilities.

      (2) It would be useful to include additional control conditions for Figure 2 to test the hypothesis that it is simply connectivity, rather than attractor dynamics, that drives alignment.

      This could be achieved by randomly assigning strengths to the recurrent connections, e.g. drawing from exponential or Gaussian distributions.

      We agree and have included Fig. S2b-d, showing that the same distribution of collateral input weights entering each neuron, but lacking the 1D structure provided by the attractor, does not align grid maps. This is achieved by shuffling rows in the connectivity matrix, while avoiding self connections to make the comparison fair (self connections substantially alter the dynamic of the network, making it much more rigid). We observed that individual grid maps have very low gridness levels, even lower than in the no-attractor condition. In contrast, they have levels of population gridness slightly higher than in the no-attractor condition, but closer to 0 than to levels achieved with attractors. Our interpretation of these results is that irregular connectivity achieves some alignment in a few arbitrary directions and/or locations, which improves the coordination between maps at the expense of impairing rather than improving hexagonal responses of individual cells. Such observations stand in clear context to what is observed with continuous attractors with an orderly architecture.

      These results suggest that it is the structure of the attractor that allows grid cells to be aligned rather than the mere presence of recurrent collateral connections.

      (3) It seems conceivable that once trained the recurrent connections would no longer be required for alignment. Can this be evaluated by considering what happens if the recurrent connections are turned off after training (or slowly turned off during training)? Does the network continue to generate aligned grid fields?

      This point has elements in common with point 1. As we argued in that response, the attractor has two main effects on grid maps: it aligns them and it contracts them. If the attractor is turned off, feedforward Hebbian learning progressively drives maps toward the solution obtained for the ‘no attractor’ condition, characterized by maps with larger spacing, poorer gridness and lack of alignment.

      (4) After training what is the relative strength of the recurrent and feedforward inputs to each neuron?

      Both recurrent and feedforward synaptic-strength matrices are normalized throughout training, so that the overall incoming synaptic strength to each neuron is invariant. Because of this, although individual feed-forward and recurrent input fields vary dynamically, their average is constant, with the exception of the very first instances of the simulation, before a stable regime is reached in grid-cell activity levels. We have included Fig. S2d, showing the dynamics of feedforward and recurrent mean fields throughout learning as well as their ratio. In addition, Fig. S2a shows that the strength of recurrent relative to feedforward inputs is an important parameter, since alignment is only obtained in an intermediate range of ratios.

      (5) It would be helpful to also evaluate the low dimensional structure of the input to the network. Assuming it has a 2D structure, as it represents 2D space, can an explanation be provided for why it is surprising that the trained network also encodes activity with a 2D manifold? It strikes me that the more interesting finding might relate to alignment of the grids rather than claims about a 1D attractor encoding a 2D representation. Either way, stronger evidence and clearer discussion would be helpful.

      The reviewer is correct in assuming that the input has a 2D structure, that can be represented by a sheet embedded in a high dimensional space and thus has the Betti numbers [1,0,0]. The surprising element in our results is that we are showing for the first time that the population activity of an attractor network is constrained to a manifold that results from the negotiation between the architecture of the attractor and the inputs, and does not merely reflect the former as previously assumed. In this sense, the alignment of grid cells by a 1D attractor is an instance of the more general case that 1D attractors can encode 2D representations.

      It is certainly the case that the 2D input is a strong constraint pushing population activity toward a 2D manifold. However, the final form of the 2D manifold is strongly constrained by the attractor, as shown by the contrast with the no-attractor condition (a 2D sheet, as in the input, vs a torus when the attractor is present). The 1D attractor is able to flexibly adapt to the constraint posed by the inputs while doing its job (as demonstrated in previous points), which results in 2D grid maps aligned by a 1D attractor. Generally speaking, this work provides a proof of principle demonstrating that the topology of the attractor architecture and the manifold of the population activity space need not be identical, as previously widely assumed by the attractor community, and need not even have the same dimensionality. Instead, a single architecture can potentially be applied to many purposes. Hence, our work provides a valuable new perspective that applies to the study of attractors throughout the brain.

      (6) The introduction should be clearer about the different types of grid model and the computations they implement. E.g. The authors' previous model generates grid fields from spatial inputs, but if my understanding is correct it isn't able to path integrate. By contrast, while the many 2D models with continuous attractor dynamics also generate grid representations, they do so by path integration mechanisms that are computationally distinct from the spatial transformation implemented by feedforward models (see also general comments above).

      We agree with the reviewer and have made this point explicit in the introduction.

      (7) A prediction from continuous attractor models is that when place cells remap the low dimensional manifold of the grid activity is unaffected, except that the location of the activity bump is moved. It strikes me as important to test whether this is the case for the model presented here (my intuition is that it won't be, but it would be important to establish either way).

      We want to emphasize that our model is a continuous attractor model, so the question regarding the difference between what our model and continuous attractor network models predict is an ill-posed one. One of our main conclusions is precisely that attractors can work in a wider spectrum of ways than previously thought.

      In lack of a better definition, our multiple simulations could be thought of as training in different arenas. It is true that in our model maps take time to form, but this is also the case in novel environments (Barry et al, 2007 ), and continuous attractor models exclusively or strongly guided by self motion cues struggle to replicate this phenomenon. We show that the current version of our model accepts multiple solutions (in practice four but conceptually infinite countable), all of them resulting in a torus for the population activity (i.e. the same topology or low dimensional manifold). It is not clear to us how easy it would be to differentiate between most of these solutions in experimental data, with only incomplete information. This said, incorporating a symmetry-breaking ingredient to the model, for example related to head direction modulation, could perhaps lead to the prevalence of a single type of solution. We intend to explore this possibility in the future in order to add path-integration capabilities to the system, as described in the discussion.

      (8) The Discussion implies that 1D networks could perform path integration in a manner similar to 2D networks. This is a strong claim but isn't supported by evidence in the study. I suggest either providing evidence that this is the case for models of this kind or replacing it with a more careful discussion of the issue.

      The current version of our model has no path integration capabilities, as is now made explicit in the Introduction and Discussion. In addition, we have now made clear that the idea that path integration could perhaps be implemented using 1D networks is, although reasonable, purely speculative.

      Minor

      (1) Introduction. 'direct excitatory communication between them'. Suggest rewording to 'local synaptic interactions', as communication can also be purely inhibitory (e.g. Burak and Fiete, 2009) or indirect by excitation of local interneurons (e.g. Pastoll et al., Neuron, 2013).

      We agree and have adopted this phrasing.

      (2) The decision to focus the topology analysis on the 60 cm wide central square appears somewhat arbitrary. Are the irregularities referred to a property of the trained networks or would they also emerge with analysis of simulated ideal data? Can more justification be expanded and supplementary analyses be shown when the whole arena is used?

      In practical terms, a subsampling of the data to around half was needed because the persistent homology packages struggle to handle large amounts of data, especially in the calculation of H2. We decided to cut a portion of contiguous pixels in the open field at least larger than the hexagonal tile representing the whole grid population period (as represented in Figure 6). Leaving the borders aside was a logical choice since it is known that the solution at the borders is particularly influenced by the speed anisotropy of the virtual rat (see Si, Kropff & Treves, 2012), in a way that mimics how borders locally influence grid maps in actual rats (Krupic et al, 2015). The specific way in which our virtual rat handles borders is arbitrary and might not generalize. A second issue around borders is that maps are differently affected by incomplete smoothing, although this issue does not apply to our data because we did not smooth across neighboring pixels. In sum, considering the central 60 cm wide square was sufficient to contain the whole torus and a reasonable compromise that would allow us to perform all analyses in the part of the environment less influenced by boundaries.

      (3) It could help the general reader to briefly explain what a persistence diagram is.

      This is developed in the Appendix, but we have now added a reference to it and a brief description in the main text.

      (4) For the analyses in Figure 3-4, and separately for Figure 5, it might help the reader to provide visualizations of the low dimensional point cloud.

      All these calculations take place in the original high-dimensional point cloud. Doing them in a reduced space would be incorrect because there is no dimensionality reduction technique that guarantees the preservation of topology. In Figure 7 we reduce the dimensionality of data but emphasize that it is only done for visualization purposes, not to characterize topology. We also point out in this Figure that the same non-linear dimensionality reduction technique applied to objects with identical topology yields a wide variety of visualizations, some of them clear and some less clear. This observation further exemplifies why one cannot assume that a dimensionality-reduction technique preserves topology, even for a low-dimensional object embedded in a high-dimensional space.

      (5) The detailed comparison of the dynamics of each model is limited by the number of data points. Why not address this by new simulations with more neurons?

      We are not sure we understand this comment. In Figure 2, the dynamics for each model are markedly different. These are averages over 100 simulations. We are not sure what benefit would be obtained from adding more neurons. Before starting this work we searched for the minimal number of neurons that would result in convergence to an aligned solution in 2D networks, which we found to be around 100. Optimizing this parameter in advance was important to reduce computational costs throughout our work.

      (6) Could the variability in Figure 7 also be addressed by increasing the number of data points?

      As we argued in a previous point, there is no reason to expect preservation of topology after applying Isomap. We believe this lack of topology preservation to be the main driver of variability.

      (7) Page/line numbers would be useful.

      We agree. However, the text is curated by biorxiv which, to our best knowledge, does not include them.

      Reviewer 2:

      Reviewer #2 (Recommendations For The Authors):

      (1) I highly suggest that the author rewrite some parts of the Results. There are lots of details which should be put into the Methods part, for example, the implementation details of the network, the analysis details of the toroidal topology, etc. It will be better to focus on the results part first in each section, and then introduce some of the key details of achieving these results, to improve the readability of the work.

      This suggestion contrasts with that of Reviewer #1. As a compromise, we decided to include in the Results section only methodological details that are key to understanding the conclusions, and describe everything else in the Methods section.

      (2) 'Progressive increase in gridness and decrease in spacing across days have been observed in animals familiarizing with a novel environment...' From Fig.2c I didn't see much decrease. The authors may need to carry out some statistical test to prove this. Moreover, even the changes are significant, this might be not the consequence of the excitatory collateral constraint. To prove this, the authors may need to offer some direct evidence.

      We agree that the decrease is not evident in this figure due to the scale, so we are adding the correlation in the figure caption as proof. In addition, several arguments, some related to new analyses, demonstrate that the attractor contracts grid maps. First, the ‘no attractor’ condition has a markedly larger spacing compared to all other conditions (Fig. 2a). We also now show that spacing monotonically decreases with the strength of recurrent relative to feedforward weights, in a way that is rather independent of gridness (Fig. S2a). Second, as we now show in Fig. S2b-d, simulations with a shuffled 1D attractor, such that the sum of input synapses to each neuron are the same as in the 1D condition but no structure is present, lead to a spacing that is mid-way between the ‘no attractor’ condition and the conditions with attractors. Third, as we now show in Fig. S3a, turning off both recurrent connections and feedforward learning in a trained network results in a small increase in spacing. Fourth, as we now show in Fig. S3b, turning off recurrent connections while feedforward learning is kept on increases grid spacing to levels comparable to those of the ‘no attractor’ condition. All these elements support a role of the attractor in contracting grid spacing.

      (3) Some of the items need to be introduced first before going into details in the paper, for instance, the stipe-like attractor network, the Betti number, etc.

      We have added in the Results section a brief description and references to full developments in the Appendix.

      Reviewer 3 (Public Review):

      (1) It is not clear to me that the proposal here is fundamentally new. In Si, Kropff and Treves (2012) recurrent connectivity was dependent on the head direction tuning and thus had a ring structure. Urdapilleta, Si, and Treves considered connectivity that depends on the distance on a 2d plane.

      In the work of Si et al connectivity is constructed ad-hoc for conjunctive cells to represent a torus, it depends on head-directionality but also on the distance in a 2D plane. The topology of this architecture has not been assessed, but it is close to the typical 2D ‘rigid’ constraint. In the work of Urdapilleta et al, the network is a simple 2D one. The difference with our work is that we focus on the topology of the recurrent network and do not use head-direction modulation. In this context, we prove that a 1D network is enough to align grid cells and, more generally, we provide a proof of principle that the topology of the architecture and the representation space of an attractor network do not need to be identical, as previously assumed by the attractor community. These two important points were neither argued, speculated nor self-evident from the cited works.

      (2) The paper refers to the connectivity within the grid cell layer as an attractor. However, would this connectivity, on its own, indeed sustain persistent attractor states? This is not examined in the paper. Furthermore, is this even necessary to obtain the results in the model? Perhaps weak connections that do not produce an attractor would be sufficient to align the spatial response patterns during the learning of feedforward weights, and reproduce the results? In general, there is no exploration of how the strength of collateral interactions affects the outcome.

      The reviewer makes several important points. Local excitation combined with global inhibition is the archetypical architecture for continuous attractors (see for example Knierim and Zhang, Annual review of neuroscience, 2012). Thus, in the absence of feedforward input, we observe a bump of activity. As in all continuous attractors, this bump is not necessarily ‘persistent’ and instead is free to move along the attractor.

      We cannot prove that there is not a simpler architecture that has the same effect as our 1D or 1DL conditions, and we think that there are some interesting candidates to investigate in the future. What we now prove in new Fig. S2b-d is that it is not the strength of recurrent connections themselves, but instead the continuous attractor structure that aligns grid cells in our model. To demonstrate this, we shuffle incoming recurrent connections to each neuron in the 1D condition (while avoiding self-connections for fairness), and show that training does not lead to grid alignment. We also show in Fig. S1 that an architecture represented by 20 overlapping 1DL attractors, each formed by concatenating 10 random cells, aligns grid cells to levels slightly lower but similar to the 1D or 1DL attractors. This architecture can perhaps be considered as simpler to build in biological terms than all the others, but it is still constituted by continuous attractors.

      The strength of recurrent collaterals, or more precisely the recurrent to feedforward ratio, is crucial in our model to achieve a negotiated outcome from constraints imposed by the attractor and the inputs. We now show explicit measures of this ratio in Fig. S2, as well as examples showing that an imbalance in this ratio impairs grid alignment. When the ratio is too high or too low, both individual and population gridness are low. Interestingly, grid spacing behaves differently, decreasing monotonically with the relative strength of recurrent connections.

      (3) I did not understand what is learned from the local topology analysis. Given that all the grid cells are driven by an input from place cells that spans a 2d manifold, and that the activity in the grid cell network settles on a steady state that depends only on the inputs, isn't it quite obvious that the manifold of activity in the grid cell layer would have, locally, a 2d structure?

      The dimensionality of the input is important, although not the only determinant of the topology of the activity. The recurrent collaterals are the other determinant, and their architecture is a crucial feature. For example, as we now show in Figure S2b-d, shuffled recurrent synaptic weights fail to align grid cells. In the 1D condition, if feedforward inputs were absent, the dynamics of the activity would be confined to a ring. The opposite condition is our ‘no attractor’ condition, in which activity in the grid cell layer mimics the topology of inputs, a 2D sheet (and not a torus). It is in the intermediate range, when both feedforward and recurrent inputs are important, that a negotiated solution (a torus) is achieved.

      The analyses of local dimensionality and local homology of Figure 3 are crucial steps to demonstrate toroidal topology. According to the theorem of classification of closed surfaces, global homology is not enough to univocally define the topology of a point cloud, and thus this step cannot be skipped. The step is aimed to prove that the point cloud is indeed a closed surface.

      (4) The modeling is all done in planar 2d environments, where the feedforward learning mechanism promotes the emergence of a hexagonal pattern in the single neuron tuning curve. This, combined with the fact that all neurons develop spatial patterns with the same spacing and orientation, implies even without any topological analysis that the emerging topology of the population activity is a torus.

      We cannot agree with this intuition. In the ‘no attractor’ condition, individual maps have hexagonal symmetry with standardized spacing, but given the lack of alignment the population activity is not a closed surface and thus not a torus. It can rather be described as a 2D sheet embedded in a high dimensional space, a description that also applies to the input space.

      While it is rather evident that an ad hoc toroidal architecture folds this 2D population activity into a torus, it is less evident and rather surprising that 1D architectures have the same capability. This is the main novelty in our work.

      (5) Moreover, the recent work of Gardner et al. demonstrated much more than the preservation of the topology in the different environments and in sleep: the toroidal tuning curves of individual neurons remained the same in different environments. Previous works, that analyzed pairwise correlations under hippocampal inactivation and various other manipulations, also pointed towards the same conclusion. Thus, the same population activity patterns are expressed in many different conditions. In the present model, the results of Figure 6 suggest that even across distinct rectangular environments, toroidal tuning curves will not be preserved, because there are multiple possible arrangements of the phases on the torus which emerge in different simulations.

      We agree with the reviewer in the main point, although the recently found ring activity in the absence of sensory feedback (Gonzalo Cogno et al, 2023) suggests that what is happening in the EC is more nuanced than a pre-wired torus. Solutions in Figure 6 are different ways of folding a 1D strip into a torus, with or without the condition of periodicity in the 1D strip. Whether or not these different solutions would be discernible from one another in a practical setup is not clear to us. For example, global homology, as addressed in the Gardner paper, is the same for all these solutions. Furthermore, while our solutions of up to order 3 are highly discernable, higher order solutions, potentially achievable with other network parameters, would be impossible to discern by eye in representations similar to the ones in Figure 6. In addition, while we chose to keep our model in the simplest possible form as a clear proof of principle, new elements introduced to the model such as head directionality could break the symmetry and lead to the prevalence of one preferred solution for all simulation replicates. We plan to investigate this possibility in the future when attempting to incorporate path-integration capabilities to the model.

      (6) In real grid cells, there is a dense and fairly uniform representation of all phases (see the toroidal tuning of grid cells measured by Gardner et al). Here the distribution of phases is not shown, but Figure 7 suggests that phases are non uniformly represented, with significant clustering around a few discrete phases. This, I believe, is also the origin for the difficulty in identifying the toroidal topology based on the transpose of the matrix M: vectors representing the spatial response patterns of individual neurons are localized near the clusters, and there are only a few of them that represent other phases. Therefore, there is no dense coverage of the toroidal manifold that would exist if all phases were represented equally. This is not just a technical issue, however: there appears to be a mismatch between the results of the model and the experimental reality, in terms of the phase coverage.

      As mentioned in the results section, Figure 7 is meant for visualization purposes only, and serves more as cautionary tale regarding the imprevisible risks of non-linear dimensionality reduction than as a proof of the organization of activity in the network. Isomap is a non-linear transformation that deforms each of our solutions in a unique way so that, while all have the topology of a torus embedded in a high dimensional space, only a few of them exhibited one of two possible toroidal visualizations in a 3D Isomap reduction. Isomap, as well as all other popular dimensionality reduction techniques, provide no guarantee of topology invariance. A better argument to judge the homogenous distribution of phases is persistent homology, which identifies relatively large holes (compared to the sampling spacing) in the original manifold embedded in a high dimensional space. In our case, persistent homology identified only two holes significantly larger than noise (the two cycles of a torus) and one cavity in all conditions that included attractors. Regarding the specific distribution of phases in different conditions, however, see our reply below.

      (7) The manuscript makes several strong claims that incorrectly represent the relation between experimental data and attractor models, on one hand, and the present model on the other hand. For the latter, see the comments above. For the former, I provide a detailed list in the recommendations to the authors, but in short: the paper claims that attractor models induce rigidness in the neural activity which is incompatible with distortions seen in the spatial response patterns of grid cells. However, this claim seems to confuse distortions in the spatial response pattern, which are fully compatible with the attractor model, with distortions in the population activity patterns, which would be incompatible with the attractor model. The attractor model has withstood numerous tests showing that the population activity manifold is rigidly preserved across conditions - a strong prediction (which is not made, as far as I can see, by feedforward models). I am not aware of any data set where distortions of the population activity manifold have been identified, and the preservation has been demonstrated in many examples where the spatial response pattern is disrupted. This is the main point of two papers cited in the present manuscript: by Yoon et al, and Gardner et al.

      First of all, we would like to note that our model is a continuous attractor model. Different attractor models have different outcomes, and one of the main conclusions of our manuscript is that attractors can do a wider range of operations than previously thought.

      We agree with the reviewer that distortions in spatial activity (which speak against a purely path-integration guided attractor) should not be confused with distortions in the topology of the population activity (which would instead speak against the attractor dynamics itself). We have rephrased these observations in the manuscript. In fact, we believe that the capacity of grid cells to present distorted maps without a distortion of the population activity topology, as shown for example by Gardner and colleagues, could result from a tension between feedforward and recurrent inputs, the potential equilibriums of which our manuscript aims to characterize.

      (8) There is also some weakness in the mathematical description of the dynamics. Mathematical equations are formulated in discrete time steps, without a clear interpretation in terms of biophysically relevant time scales. It appears that there are no terms in the dynamics associated with an intrinsic time scale of the neurons or the synapses, and this introduces a difficulty in interpreting synaptic weights as being weak or strong. As mentioned above, the nature of the recurrent dynamics within the grid cell network (whether it exhibits continuous attractor behavior) is not sufficiently clear.

      We agree with the reviewer that our model is rather simple, and we value the extent to which this simplicity allows for a deep characterization. All models are simplifications and the best model in any given setup is the one with the minimum amount of complexity necessary to describe the phenomenon under study. We believe that to understand whether or not a 1D continuous attractor architecture can result in a toroidal population activity, a biophysically detailed model, with prohibitive computational costs, would have been unnecessarily complex. This argument does not intend to demerit biophysically detailed models, which are capable of addressing a wider range of questions regarding, for example, the spiking dynamics of grid cells, which cannot be addressed by our simple model.

      Reviewer #3 (Recommendations For The Authors):

      The work points to an interesting scenario for the emergence of toroidal topology, but the interpretation of this idea should be more nuanced. I recommend reconsidering the claims about limitations of the attractor theory, and acknowledging the limitations of the present theory.

      I don't see the limitations mentioned above as a reason to reject the ideas proposed in this manuscript, for two main reasons: first, additional research might reveal a regime of parameters where some issues can be resolved (e.g. the clustering of phases). In addition, the mechanism described here might act at an early stage in development to set up initial dynamics along a toroidal manifold, while other mechanisms might be responsible for the rigidity of the toroidal manifold in an adult animal. But all this implies that the novelty in the present manuscript is weaker than implied, the ability to explain experimental observations is more limited than implied, and these limitations should be acknowledged and discussed.

      I recommend reporting on the distribution of grid cell phases and, if indeed clustered, this should be discussed. It will be helpful to explore whether this is the reason for the difficulty in identifying the toroidal topology based on the collection of spatial response patterns (using the transpose of the matrix M).

      Ideally, a more complete work would also explore in a more systematic and parametric way the influence of the recurrent connectivity's strength on the learning, and whether a toroidal manifold emerges also in non-planar, such as the wagon-wheel environment studied in Gardner et al.

      Part of these recommendations have been addressed in the previous points (public review). Regarding the reason why the transpose of M does not fully recapitulate architecture with our conservative classification criteria, we believe that there is no reason why it should in the first place. We view the fact that the transpose of M recapitulates some features of the architecture as a purely phenomenological observation, and we think it is important as a proof that M is not exactly the same for the different conditions. We imagined that if M matrices were exactly the same this could be due to poor spatial sampling by our bins. Knowing that they are intrinsically different is important even if the reason why they have these specific features is not fully clear to us.

      Although we do not think that the distribution of phases is related to the absence of a cavity in the transpose of M or to the four clusters found in Isomap projections, it remains an interesting question that we did not explore initially. We are now showing examples of the distribution of phases in Figure S1. We observed that in both 2D and 1D conditions phases are distributed following rather regular patterns. Whether or not these patterns are compatible with experimental observations of phase distribution is to our view debatable, given that so far state-of-the-art techniques have only allowed to simultaneously record a small fraction of the neurons belonging to a given module. This said, we think that it is important to note that ordered phase patterns are an anecdotal outcome of our simulations rather than a necessary outcome of flexible attractors or attractors in general. To prove this point, we simulated a condition with a new architecture represented by the overlap of 20 short 1DL attractors, each recruiting 10 random neurons from the pool of 100 available ones.

      The rest of the parameters of the simulations were identical to those in the other conditions.

      By definition, the topology of this architecture has Betti numbers [20,0,0]. We show in Figure S1 that this architecture aligns grid cells, with individual and population gridness reaching slightly lower levels compared to the 1D condition. However, the distribution of phases of these grid cells has no discernible pattern. This result is an arbitrary example that serves as a proof-of-principle to show that flexible attractors can align grid cells without exhibiting ordered phases, not a full characterization of the outcome of this type of architecture, which we leave for future work. For the rest of our work, we stick to the simplest versions of 1D architectures, which allow for a more in-depth characterization.

      The wagon-wheel is an interesting case in which maps loose hexagonal symmetry although the population activity lies in a torus, perhaps evidencing the tension between feedforward and recurrent inputs and suggesting that grid cell response does not obey the single master of path integration. If we modeled it with a 1D attractor, we believe the outcome would strongly depend on virtual rat trajectory. If the trajectory was strictly linear, the population activity would be locally one-dimensional and potentially represented by a ring. Instead, if the trajectory allowed for turns, i.e. a 2D trajectory within a corridor-like maze, the population activity would be toroidal as in our open field simulations, while maps would not have perfect hexagonal symmetry, mimicking experimental results.

      More minor comments:

      Recurrent dynamics are modeled as if there is no intrinsic synaptic or membrane time constant. This may be acceptable for addressing the goals of this paper, but it is a bit unusual and it will be helpful to explain and justify this choice.

      As mentioned above, we believe that the best model in a given setup is the one with the lowest number of complexities that can still address the phenomenon under study. One does not use general relativity to build a bridge, although it provides a ‘more accurate’ description of the physics involved. All models are simplifications, and the more complex a model, the more it has to be taken as a black box.

      The Introduction mentions that in most models interaction between co-modular neurons occurs through direct excitatory communication, but in quite a few models the interaction is inhibitory. The crucial feature is that the interaction is strongly inhibitory between neurons that differ in their tuning, and either less inhibitory or excitatory between neurons with similar phases.

      We agree that directed inhibition has been shown to be as efficient as directed excitation, and we have modified the introduction to reflect this.

      The Discussion claims that the present work is the first one in which the topology of the recurrent architecture differs from the topology of the emergent state space. However, early works on attractor models of grid cells showed how neural connectivity which is arranged on a 2d plane, without any periodic boundary conditions, leads to a state space that exhibits the toroidal topology. Therefore, this claim should be revised.

      We agree, although the 2D sheet in this case acts as a piece of the torus, and locally the input space and architecture are identical objects. It could be argued that architectures that represent a 2D local slice of the torus, the whole torus, or several cycles around the torus form a continuous family parametrized by the extension of recurrent connections, and as a consequence it is not surprising that these works have not made claims about the incongruence between architecture and representation topologies. The 2D sheet connectivity is still constructed ad hoc to organize activity in a 2D bump, and there is no negotiation between disparate constraints because locally the constraints imposed by input and architecture are the same. We believe this situation is conceptually different from our flexible 1D attractors. We have adapted our claim to include this technical nuance.

      Why are neural responses in the perimeter of the environment excluded from the topological analysis? The whole point of the toroidal manifold analysis on real experimental data is that the toroidal manifold is preserved regardless of the animal's location and behavioral condition.

      We agree, although experimental data needs to go through extensive pre-processing such as dimensionality reduction before showing a toroidal topology. Such manipulations might smooth away the specific effects of boundaries on maps, together with other sources of noise. In our case, the original reason to downsample the dataset is related to the explosion in computational time that we experience with the ripser package when using more than ~1000 data points. For a proof-of-principle characterization we were much more interested in what happened in the center of the arena, where a 1D attractor could fold itself to confine population activity into a torus. The area we chose was sufficiently large to contain the whole torus. Borders do affect the way the attractor folds (they also affect grid maps in real rats). We feel that these imperfections could be interesting to study in relation to the parameters controlling how our virtual rat behaves at the borders, but not at this proof-of-principle stage.

      The periodic activity observed in Ref. 29 could in principle provide the basis for the ring arrangement of neurons. However, it is not yet clear whether grid cells participate in this periodic activity.

      We agree. So far it seems that entorhinal cells in general participate in the ring, which would imply that all kinds of cells are involved. However, it could well be that only some functional types participate in the ring and grid cells specifically do not, as future experiments will tell.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work explores death coding data to understand the impact of COVID-19 on cancer mortality. The work provides solid evidence that deaths with cancer as a contributing cause were not above what would be expected during pandemic waves, suggesting that cancer did not strongly increase the risk of dying of COVID-19. These results are an interesting exploration into the coding of causes of death that can be used to make sense of how deaths are coded during a pandemic in the presence of other underlying diseases, such as cancer.

      We thank the editor and reviewers for the time they took to review our manuscript and for the thoughtful suggestions they provided. We have completed several revisions based on their feedback and we feel our paper is stronger as a result. However, none of these revisions change the overall conclusions of our study.

      Reviewer #1 (Public Review):

      Summary:

      In the paper "Disentangling the relationship between cancer mortality and COVID-19", the authors study whether the number of deaths in cancer patients in the USA went up or down during the first year (2020) of the COVID-19 pandemic. They found that the number of deaths with cancer mentioned on the death certificate went up, but only moderately. In fact, the excess with-cancer mortality was smaller than expected if cancer had no influence on the COVID mortality rate and all cancer patients got COVID with the same frequency as in the general population. The authors conclude that the data show no evidence of cancer being a risk factor for COVID and that the cancer patients were likely actively shielding themselves from COVID infections.

      Strengths:

      The paper studies an important topic and uses sound statistical and modeling methodology. It analyzes both, deaths with cancer listed as the primary cause of death, as well as deaths with cancer listed as one of the contributing causes. The authors argue, correctly, that the latter is a more important and reliable indicator to study relationships between cancer and COVID. The authors supplement their US-wide analysis by analysing three states separately.

      Weaknesses:

      The main findings of the paper can be summarized as six numbers. Nationally, in 2022, multiple-cause cancer deaths went up by 2%, Alzheimer's deaths by 31%, and diabetes deaths by 39%. At the same time, assuming no relationship between these diseases and either Covid infection risk or Covid mortality risk, the deaths should have gone up by 7%, 46%, and 28%. The authors focus on cancer deaths and as 2% < 7%, conclude that cancer is not a risk factor for COVID and that cancer patients must have "shielded" themselves against Covid infections.

      However, I did not find any discussion of the other two diseases. For diabetes, the observed excess was 39% instead of "predicted by the null model" 28%. I assume this should be interpreted as diabetes being a risk factor for Covid deaths. I think this should be spelled out, and also compared to existing estimates of increased Covid IFR associated with diabetes.

      And what about Alzheimer's? Why was the observed excess 31% vs the predicted 46%? Is this also a shielding effect? Does the spring wave in NY provide some evidence here? Why/how would Alzheimer's patients be shielded? In any case, this needs to be discussed and currently, it is not.

      We thank the reviewer for their positive feedback on the paper and for these suggestions. It is true that we have emphasized the impact on cancer deaths, as this was the primary aim of the paper. In the revised version, we have expanded the results and discussion sections to more fully describe the other chronic conditions we used as comparators (lines 267-284;346 – 386).

      Note that we are somewhat reluctant to designate any of these conditions as risk factors based solely on comparing the time series model with the demographic model of our expectations. As we mention in the discussion, there is considerable uncertainty around estimates from the demographic model in terms of the size of the population-at-risk, the mean age of the population-at-risk, and the COVID-19 infection rates and infection fatality ratios. Our demographic model is primarily used to demonstrate the effects of competing risks across types of cancers and chronic conditions, since these findings are robust to model assumptions. In contrast, the demographic model should be used with caution if the goal is to titrate the level of these risk factors (as the level of imputed risk is dependent on model assumptions). In the updated version of the manuscript, we have included uncertainty intervals in Table 3, using the upper and lower bounds of the estimated infection rates and IFRs, to better represent this uncertainty. We have also discussed this uncertainty more explicitly in the text and ran sensitivity analyses with different infection rate assumptions in the discussion (lines 354-362; 367 -370).

      We would like to note that rather than interpreting the absolute results, we used this demographic model as a tool to understand the relative differences between these conditions. From the demographic model we determined that we would expect to see much higher mortality in diabetes and Alzheimer’s deaths compared to cancer deaths due to three factors (1. Size of population-at-risk, 2. Mean age of the population-at-risk, 3. Baseline risk of mortality from the condition), that are separate from the COVID-19 associated IFR. And in general, this is what we observed.

      In comparing the results from the demographic model to the observed excess, diabetes does standout as an outlier from cancer and Alzheimer’s disease in that the observed excess is consistently above the null hypothesis which does lend support to the conclusion that diabetes is in fact a risk factor for COVID-19. A conclusion which is also supported by many other studies. Our findings for hematological cancers are also similar, in that we find consistent support for this condition being a risk factor. We have commented on this in the discussion and added a few references (lines 346-354; 395-403).

      Our hypothesis regarding non-hematological cancer deaths (lower than anticipated mortality due to shielding) could also apply to Alzheimer’s deaths. Furthermore, we used the COVID-19 attack rate for individuals >65 years (based on the data that is available), but we estimate that the mean age of Alzheimer’s patients is actually 80-81 years, so this attack rate may in fact be a bit too high, which would increase our expected excess. We have commented on this in the discussion (lines 363-377).

      Reviewer #2 (Public Review):

      The article is very well written, and the approach is quite novel. I have two major methodological comments, that if addressed will add to the robustness of the results.

      (1) Model for estimating expected mortality. There is a large literature using a different model to predict expected mortality during the pandemic. Different models come with different caveats, see the example of the WHO estimates in Germany and the performance of splines (Msemburi et al Nature 2023 and Ferenci BMC Medical Research Methodology 2023). In addition, it is a common practice to include covariates to help the predictions (e.g., temperature and national holidays, see Kontis et al Nature Medicine 2020). Last, fitting the model-independent for each region, neglects potential correlation patterns in the neighbouring regions, see Blangiardo et al 2020 PlosONE.

      Thank you for these comments and suggestions. We agree there are a range of methods that can be used for this type of analysis, and they all come with their strengths, weaknesses, and caveats. Broadly, the approach we chose was to fit the data before the pandemic (2014-2019), and project forward into 2020. To our knowledge it is not a best practice to use an interpolating spline function to extrapolate to future years. This is demonstrated by the WHO estimates in Germany in the paper you mention. This was our motivation for using polynomial and harmonic terms.

      Based on the above:

      a. I believe that the authors need to run a cross-validation to justify model performance. I would suggest training the data leaving out the last year for which they have mortality and assessing how the model predicts forward. Important metrics for the prediction performance include mean square error and coverage probability, see Konstantinoudis et al Nature Communications 2023. The authors need to provide metrics for all regions and health outcomes.

      Thank you for this suggestion. We agree that our paper could be strengthened by including cross validation metrics to justify model performance. Based on this suggestion, and your observations regarding Alzheimer’s disease, we have done two things. First, for the full pre-pandemic period (2014-2019) for each chronic condition and location we tested three different models with different degree polynomials (1. linear only, 2. linear + second degree polynomial, 3. linear + second degree polynomial + third degree polynomial) and used AIC to select the best model for each condition and location. Next, also in response to your suggestion, we estimated coverage statistics. Using the best fit model from the previous step, we then fit the model to data from 2014-2018 only and used the model to predict the 2019 data. We calculated the coverage probability as the proportion of weekly observed data points that fell within the 95% prediction interval. For all causes of death and locations the coverage probability was 100% (with the exception of multiple cause kidney disease in California, which is only shown in the appendix). The methods and results have been updated to reflect this change and we have added a figure to the appendix showing the selected model and coverage probability for each cause of death and location (lines 504 – 519; 847-859; Appendix 1- Figure 11).

      b. In the context of validating the estimates, I think the authors need to carefully address the Alzheimer case, see Figure 2. It seems that the long-term trends pick an inverse U-shape relationship which could be an overfit. In general, polynomials tend to overfit (in this case the authors use a polynomial of second degree).It would be interesting to see how the results change if they also include a cubic term in a sensitivity analysis.

      Thank you for this observation. Based on the changes described above, the model for Alzheimer’s disease now includes a cubic term in the national data and in Texas and California. The model with the second-degree polynomial remained the best fit for New York (Appendix 1 – Figure 11).

      c. The authors can help with the predictions using temperature and national holidays, but if they show in the cross-validation that the model performs adequately, this would be fine.

      At the scale of the US, adding temperature or environmental covariates is difficult and few US-wide models do so (see Goldstein 2012 and Quandelacy 2014 for examples from influenza). Furthermore, because we are looking at chronic disease outcomes, it is unclear that viral covariates or national holidays would drive these outcomes in the same way as they would if we were looking at mortality outcomes more directly related to transmissible diseases (such as respiratory mortality). Our cross validation also indicates that our models fit well without these additional covariates.

      d. It would be nice to see a model across the US, accounting for geography and spatial correlation. If the authors don't want to fit conditional autoregressive models in the Bayesian framework, they could just use a random intercept per region.

      We think the reviewer is mistaken here about the scale of our national analysis. Our national analysis did not fit independent models for each state or region. Rather, we fit a single model to the weekly-level national mortality data where counts for the whole of the US have been aggregated. We have clarified in the text (lines 156, 464). As such, we do not feel a model accounting for spatial correlation would be appropriate nor would we be able to include a random intercept for each region. We did fit three states independently (NY, TX, CA), but these states are very geographically distant from each other and unlikely to be correlated. These states were chosen in part because of their large population sizes, yet even in these states, confidence intervals were very wide for certain causes of death. Fitting models to each of the 50 US states, most of which are smaller than those chosen here, would exacerbate this issue.

      (2) I think the demographic model needs further elaboration. It would be nice to show more details, the mathematical formula of this model in the supplement, and explain the assumptions

      Thank you for this comment. We have added additional details on the demographic model to the methods. We have also extended this analysis to each state to further strengthen our conclusions (lines 548-590).

      Reviewing Editor Recommendations:

      I think that perhaps something that is missing is that the authors never make their underlying assumption explicit: they are assuming that if cancer increases the risk of dying of COVID-19, this would be reflected in the data on multiple causes of death where cancer would be listed as one of the multiple causes rather than as the underlying cause, and that their conclusions are predicated on this assumption. I would suggest explicitly stating this assumption, as opposed to other reasons why cancer mortality would increase (ex. if cancer care worsened during pandemic waves leading to poorer cancer survival).

      Response: Thank you for this suggestion. We have added a few sentences to the introduction to make this assumption clear (lines 106-112).

      Reviewer #1 (Recommendations For The Authors):

      - It could make sense to add "in the United States" into the title, as the paper only analyses US data.

      - It may make sense to reformulate the title from "disentangling the relationship..." into something that conveys the actual findings, e.g. "Lack of excess cancer mortality during Covid-19 pandemic" or something similar. Currently, the title tells nothing about the findings.

      Thank you for these suggestions. We have added “in the US” to the title. However, we feel that our findings are a bit more subtle than the suggested reformulation would imply, and we prefer to leave it in its current form.

      - Abstract, lines 42--45: This is the main finding of the paper, but I feel it is simplified too strongly in the abstract. Your simulations do *not* "largely explain" excess mortality with cancer; they give higher numbers! Which you interpret as "shielding" etc., but this is completely absent from the abstract. This sentence makes the impression that you got a good fit between simulated excess and real excess, which I would say is not the case.

      Thank you for this comment. We have rephrased the sentence in the abstract to better reflect our intentions for using the demographic model (lines 46-49). As stated above, the purpose of the demographic model was not to give a good fit with the observed excess mortality. Rather, we used the demographic model as a tool to understand the relative differences between these conditions in terms of expected excess mortality given the size, age-distribution, and underlying risk of death from the condition itself, assuming similar IFR and attack rates. And based on this, we conclude that it is not necessarily surprising that we see higher excess mortality for diabetes and Alzheimer’s compared to cancer.

      - Results line 237: you write that it's "more consistent with the null hypothesis", however clearly it is *not* consistent with the null hypothesis either (because 2% < 7%). You discuss in the Discussion that it may be due to shielding, but it would be good to have at least one sentence about it already here in the Results, and refer to the Discussion.

      We have mentioned this in the results and refer to the discussion (lines 277-278).

      - Results line 239: why was it closer to the assumption of relative risk 2? If I understand correctly, your model prediction for risk=1 was 7% and for risk=2 it was 13%. In NY you observed 8% (line 187). How is this closer to risk=2?

      Thank you for this observation. We have updated the demographic model with new data, extended the model to state-level data, and included confidence intervals on these estimates. We have also added additional discussion around the differences between our observations and expectations (lines 249-284).

      - Discussion line 275: "we did not expect to see large increases" -- why exactly? Please spell it out here. Was it due to the age distribution of the cancer patients? Was it due to the high cancer death risk?

      We demonstrate that it is the higher baseline risk of death for cancer that seems to be driving our low expectations for cancer excess mortality (lines 304-320). We have added this to the sentence to clarify our conclusions on this point and have added a figure to better illustrate this concept of competing risks (Figure 6).

      - Methods, line 405: perhaps it makes sense to cite some other notable papers on Covid excess mortality such as Msemburi et al Nature 2023, Karlinsky & Kobak eLife 2021, Islam et al BMJ 2021, etc.

      Thank you for mentioning this oversight. We certainly should have cited these papers and have included them in the updated version.

      - Methods line 410: why did you use a 5-week moving average? Why not fit raw weekly death counts? NB regression should be able to deal with it.

      Smoothing time series data with a moving average prior to running regression models is a very common practice. We did a sensitivity analysis using the raw data. This produced excess estimates with slightly larger confidence intervals, but does not change the overall conclusions of the paper.

      - Methods line 416: please indicate the software/library/package you used for fitting NB regression.

      We fit the NB regression using the MASS package in R version 4.3. We have added this to the methods (line 519).

      - Line 489: ORCHID -> ORCID

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      Codol et al. present a toolbox that allows simulating biomechanically realistic effectors and training Artificial Neural Networks (ANNs) to control them. The paper provides a detailed explanation of how the toolbox is structured and several examples that demonstrate its usefulness.

      Main comments:

      (1) The paper is well written and easy to follow. The schematics help in understanding how the toolbox works and the examples provide an idea of the results that the user can obtain.

      We thank the reviewer for this comment.

      (2) As I understand it, the main purpose of the paper should be to facilitate the usage of the toolbox. For this reason, I have missed a more explicit link to the actual code. As I see it, researchers will read this paper to figure out whether they can use MotorNet to simulate their experiments, and how they should proceed if they decide to use it. I'd say the paper provides an answer to the first question and assures that the toolbox is very easy to install and use. Maybe the authors could support this claim by adding "snippets" of code that show the key steps in building an actual example.

      This is an important point, which we also considered when writing this paper. We instead decided to focus on the first approach, because it is easier to illustrate the scientific use of the toolbox using code or interactive (Jupyter) notebooks than a publication format. We find the “how to proceed” aspect of the toolbox can more easily and comprehensively be covered using online, interactive tutorials. Additionally, this allows us to update these tutorials as the toolbox evolves over different versions, while it is more difficult to update a scientific article. Consequently, we explicitly avoided code snippets on the article itself. However, we appreciate that the paper would gain in clarity if this was more explicitly stated early. We have modified the paper to include a pointer to where to find tutorials online. We added this at the last paragraph of the introduction section:

      The interested reader may consult the full API documentation, including interactive tutorials on the toolbox website at https://motornet.org.

      (3) The results provided in Figures 1, 4, 5 and 6 are useful, because they provide examples of the type of things one can do with the toolbox. I have a few comments that might help improving them:

      a. The examples in Figures 1 and 5 seem a bit redundant (same effector, similar task). Maybe the authors could show an example with a different effector or task? (see point 4).

      The effectors from figures 1 and 5 are indeed very similar. However, the tasks in figure 1 and 5 present some important differences. The training procedure in figure 1 never includes any perturbations, while the one from figure 5 includes a wide range of perturbations of different magnitudes, timing and directions. The evaluation procedure of figure 1 includes center-out reaches with permanent viscous (proportional to velocity) external dynamics, while that of figure 5 are fixed, transient, square-shaped perturbation orthogonal to the reach direction. Finally, the networks in figure 1 undergo a second training procedure after evaluation while the network of figure 5 do not.

      While we agree that some variation of effectors would be beneficial, we do show examples of a point-mass effector in figure 6. Overall, figure 5 shows a task that is quite different from that of figure 1 with a similar effector, while the opposite is true for figure 6. We have modified the text to clarify this for the reader, by adding the following.

      End of 1st paragraph, section 2.4.

      Therefore, the training protocol used for this task largely differed from section 2.1 in that the networks are exposed to a wide range of mechanical perturbations with varying characteristics.

      1st paragraph of section 2.5

      […] this asymmetrical representation of PMDs during reaching movements did not occur when RNNs were trained to control an effector that lacked the geometrical properties of an arm such as illustrated in Figure 4c-e and section 2.1.

      b. I missed a discussion on the relevance of the results shown in Figure 4. The moment arms are barely mentioned outside section 2.3. Are these results new? How can they help with motor control research?

      We thank the reviewer for this comment. This relates to a point from reviewer 2 indicating that the purpose of each section was sometimes difficult to grasp as one reads. Section 2.3 explains the biomechanical properties that the toolbox implements to improve realism of the effector. They are not new results in the sense that other toolboxes implement these features (though not in differentiable formats) and these properties of biological muscles are empirically well-established. However, they are important to understand what the toolbox provides, and consequently what constraints networks must accommodate to learn efficient control policies. An example of this is the results in figure 6, where a simple effector versus a more biomechanically complex effector will yield different neural representations.

      Regarding the manuscript itself, we agree that more clarity on the goal of every paragraph may improve the reader’s experience. Consequently, we ensured to specify such goals at the start of each section. Particularly, we clarify the purpose of section 2.3 by adding several sentences on this at the end of the first paragraph in that section. We also now clearly state the purpose of section 2.3 with the results of figure 6 and reference figure 4 in that section.

      c. The results in Figure 6 are important, since one key asset of ANNs is that they provide access to the activity of the whole population of units that produces a given behavior. For this reason, I think it would be interesting to show the actual "empirical observations" that the results shown in Fig. 6 are replicating, hence allowing a direct comparison between the results obtained for biological and simulated neurons.

      These empirical observations are available from previous electrophysiological and modelling work. Particularly, polar histograms across reaching directions like panel C are displayed in figures 2 and 3 of Scott, Gribble, Graham, Cabel (2001, Nature). Colormaps of modelled unit activity across time and reaching directions like panel F are also displayed in figure 2 of Lillicrap, Scott (2013, Neuron). Electrophysiological recordings of M1 neurons during a similar task in non-human primates can also be seen on “Preserved neural population dynamics across animals performing similar behaviour” figure 2 B (https://doi.org/10.1101/2022.09.26.509498) and “Nonlinear manifolds underlie neural population activity during behaviour” figure 2 B as well (https://doi.org/10.1101/2023.07.18.549575). Note that these two pre-prints use the same dataset.

      We have added these citations to the text and made it explicit that they contain visualizations of similar modelling and empirical data for comparison:

      This heterogeneous set of responses matches empirical observations in non-human primate primary motor cortex recordings (Churchland & Shenoy, 2007; Michaels et al., 2016) and replicate similar visualizations from previously published work (Fortunato et al., 2023; Lillicrap & Scott, 2013; Safaie et al., 2023).

      (4) All examples in the paper use the arm26 plant as effector. Although the authors say that "users can easily declare their own custom-made effector and task objects if desired by subclassing the base Plant and Task class, respectively", this does not sound straightforward. Table 1 does not really clarify how to do it. Maybe an example that shows the actual code (see point 2) that creates a new plant (e.g. the 3-joint arm in Figure 7) would be useful.

      Subclassing is a Python process more than a MotorNet process, as python is an object-oriented language. Therefore, there are many Python tutorials on subclassing in the general sense that would be beneficial for that purpose. We have amended the main text to ensure that this is clearer to the reader.

      Subclassing a MotorNet object, in a more specific sense, requires overwriting some methods from the base MotorNet classes (e.g., Effector or Environment classes, which correspond to the original Plant and Task object, respectively). Since we made the decision (mentioned above) to not include code in the main text, we added tutorials to the online documentation, which include dedicated tutorials for MotorNet class subclassing. For instance, this tutorial showcases how to subclass Environment classes:

      https://colab.research.google.com/github/OlivierCodol/MotorNet/blob/master/examples/3-environments.ipynb

      (5) One potential limitation of the toolbox is that it is based on Tensorflow, when the field of Computational Neuroscience seems to be, or at least that's my impression, transitioning to pyTorch. How easy would it be to translate MotorNet to pyTorch? Maybe the authors could comment on this in the discussion.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which are reflected by changes made throughout the manuscript, notably in Figure 3 and Table 1.

      (6) Supervised learning (SL) is widely used in Systems Neuroscience, especially because it is faster than reinforcement learning (RL). Thus providing the possibility of training the ANNs with SL is an important asset of the toolbox. However, SL is not always ideal, especially when the optimal strategy is not known or when there are different alternative strategies and we want to know which is the one preferred by the subject. For instance, would it be possible to implement a setup in which the ANN has to choose between 2 different paths to reach a target? (e.g. Kaufman et al. 2015 eLife). In such a scenario, RL seems to be a more natural option Would it be easy to extend MotorNet so it allows training with RL? Maybe the authors could comment on this in the discussion.

      The new implementation of MotorNet that relies on PyTorch is already standardized to use an API that is compatible with Gymnasium. Gymnasium is a standard and popular interfacing toolbox used to link RL agents to environments. It is very well-documented and widely used, which will ensure that users who wish to employ RL to control MotorNet environments will be able to do so relatively effortlessly. We have added this point to accurately reflect the updated implementation, so users are aware that it is now a feature of the toolbox (new section 3.2.4.).

      Impact:

      MotorNet aims at simplifying the process of simulating complex experimental setups to rapidly test hypotheses about how the brain produces a specific movement. By providing an end-to-end pipeline to train ANNs on the simulated setup, it can greatly help guide experimenters to decide where to focus their experimental efforts.

      Additional context:

      Being the main result a toolbox, the paper is complemented by a GitHub repository and a documentation webpage. Both the repository and the webpage are well organized and easy to navigate. The webpage walks the user through the installation of the toolbox and the building of the effectors and the ANNs.

      Reviewer #2 (Public Review):

      MotorNet aims to provide a unified interface where the trained RNN controller exists within the same TensorFlow environment as the end effectors being controlled. This architecture provides a much simpler interface for the researcher to develop and iterate through computational hypotheses. In addition, the authors have built a set of biomechanically realistic end effectors (e.g., an 2 joint arm model with realistic muscles) within TensorFlow that are fully differentiable.

      MotorNet will prove a highly useful starting point for researchers interested in exploring the challenges of controlling movement with realistic muscle and joint dynamics. The architecture features a conveniently modular design and the inclusion of simpler arm models provides an approachable learning curve. Other state-of-the-art simulation engines offer realistic models of muscles and multi-joint arms and afford more complex object manipulation and contact dynamics than MotorNet. However, MotorNet's approach allows for direct optimization of the controller network via gradient descent rather than reinforcement learning, which is a compromise currently required when other simulation engines (as these engines' code cannot be differentiated through).

      The paper could be reorganized to provide clearer signposts as to what role each section plays (e.g., that the explanation of the moment arms of different joint models serves to illustrate the complexity of realistic biomechanics, rather than a novel discovery/exposition of this manuscript). Also, if possible, it would be valuable if the authors could provide more insight into whether gradient descent finds qualitatively different solutions to RL or other non gradient-based methods. This would strengthen the argument that a fully differentiable plant is useful beyond improving training time / computational power required (although this is a sufficiently important rationale per se).

      We thank the reviewer for these comments. We agree that more clarity on the section goals may improve the reader’s experience and ensured this is the case throughout the manuscript. Particularly, we added the following on the first paragraph of section 2.3, for which an explicit goal was most missing:

      In this section we illustrate some of these biomechanical properties displayed by MotorNet effectors using specific examples. These properties are well-characterised in the biology and are often implemented in realistic biomechanical simulation software.

      Regarding the potential difference in solutions obtained from reinforcement or supervised learning, this would represent a non-trivial amount of work to do so conclusively and so may not be within the scope of the current article. We do appreciate however that in some situations RL may be a more fitting approach to a given task design. In relation to this point we now specify in the discussion that the new API can accommodate interfacing with reinforcement learning toolboxes for those who may want to pursue this type of policy training approach when appropriate (new section 3.2.4.).

      Reviewer #3 (Public Review):

      Artificial neural networks have developed into a new research tool across various disciplines of neuroscience. However, specifically for studying neural control of movement it was extremely difficult to train those models, as they require not only simulating the neural network, but also the body parts one is interested in studying. The authors provide a solution to this problem which is built upon one of the main software packages used for deep learning (Tensorflow). This allows them to make use of state-of-the-art tools for training neural networks.

      They show that their toolbox is able to (re-)produce several commonly studied experiments e.g., planar reaching with and without loads. The toolbox is described in sufficient detail to get an overview of the functionality and the current state of what can be done with it. Although the authors state that only a few lines of code can reproduce such an experiment, they unfortunately don't provide any source code to reproduce their results (nor is it given in the respective repository).

      The possibility of adding code snippets to the article is something we originally considered, and which aligns with comment two from reviewer one (see above). Hopefully this provides a good overview of the motivation behind our choice not to add code to the article.

      The modularity of the presented toolbox makes it easy to exchange or modify single parts of an experiment e.g., the task or the neural network used as a controller. Together with the open-source nature of the toolbox, this will facilitate sharing and reproducibility across research labs.

      I can see how this paper can enable a whole set of new studies on neural control of movement and accelerate the turnover time for new ideas or hypotheses, as stated in the first paragraph of the Discussion section. Having such a low effort to run computational experiments will be definitely beneficial for the field of neural control of movement.

      We thank the reviewer for these comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors sought to test whether anterior insular cortex neurons increase or decrease firing during fear behavior and freezing, bi-directionally control fear via separate, anatomically defined outputs. Using a fairly simple behavior where mice were exposed to tone-shock pairings, they found roughly equal populations that do indeed either increase or decrease firing during freezing. Next, they sought to test whether these distinct populations may also have distinct outputs. Using retrograde tracers they found that the anterior insular cortex contains non-overlapping neurons which project to the mediodorsal thalamus or amygdala. Mediodorsal thalamus-projecting neurons tended to cluster in deep cortical layers while amygdala-projecting neurons were primarily in more superficial layers. Stimulation of insula-thalamus projection decreased freezing behavior, and stimulation of insula-amygdala projections increased fear behavior. Given that the neurons that increased firing were located in deep layers, that thalamus projections occurred in deep layers, and that stimulation of insula-thalamus neurons decreased freezing, the authors concluded that the increased firing neurons may be thalamus projections. Similarly, given that decreased-firing neurons tended to occur in more superficial layers, that insula-amygdala projections were primarily superficial, and that insula-amygdala stimulation increased freezing behavior, authors concluded that the decreased firing cells may be amygdala projections. The study has several strengths though also some caveats.

      Strengths:

      The potential link between physiological activity, anatomy, and behavior is well laid out and is an interesting question. The activity contrast between the units that increase/decrease firing during freezing is clear.

      It is nice to see the recording of extracellular spiking activity, which provides a clear measure of neural output, whereas similar studies often use bulk calcium imaging, a signal that rarely matches real neural activity even when anatomy suggests it might (see London et al 2018 J Neuro - there are increased/decreased spiking striatal populations, but both D1 and D2 striatal neurons increase bulk calcium).

      Weaknesses:

      The link between spiking, anatomy, and behavior requires assumptions/inferences: the anatomically/genetically defined neurons which had distinct outputs and opposite behavioral effects can only be assumed the increased/decreased spiking neurons, based on the rough area of the cortical layer they were recorded.

      Yes, we are aware that we could not provide a direct link between spiking, anatomy and behavior. We have specifically noted this in the discussion section and added a possible experiment that could be carried out to provide a more direct link in a future study.

      [Lines 371-375] We would like to provide a more direct evidence between the neuronal response types and projection patterns in future studies by electrophysiologically identifying freezing-excited and freezing-inhibited aIC neurons and testing whether those neurons activates to optogenetic activation of amygdala or medial thalamus projecting aIC neurons.

      The behavior would require more control to fully support claims about the associative nature of the fear response (see Trott et al 2022 eLife) - freezing, in this case, could just as well be nonassociative. In a similar vein, fixed intertrial intervals, though common practice in the fear literature, pose a problem for neurophysiological studies. The first is that animals learn the timing of events, and the second is that neural activity is dynamic and changes over time. Thus it is very difficult to determine whether changes in neural activity are due to learning about the tone-shock contingency, timing of the task, simply occur because of time and independently of external events, or some combination of the above.

      Trott et al. (2022) stated that "...freezing was the purest reflection of associative learning." The nonassociative processes mentioned in the study were related to running and darting behaviors, which the authors argue are suppressed by associative learning. Moreover, considerable evidence from immediate postshock freezing and immediate postshock context shift studies all indicate that the freezing response is an associative (and not nonassociative) response (Fanselow, 1980 and 1986; and Landeira-Fernandez et al., 2006). Thus, our animals' freezing response to the tone CS presentation in a novel context, following three tone CS-footshock US pairings, most likely reflects associative learning. 

      Concerning the issue of fixed inter-trial intervals (ITIs), which are standard in fear conditioning studies, particularly those with few CS-US paired trials, we acknowledge the challenge in interpreting the neural correlates of behavior. However, the ITIs in our extinction study was variable and we still found neural activities that had significant correlation with freezing. The results of our extinction study, carried out with variable it is, suggest that the aIC neural activity changes measured in this study is likely due to freezing behavior associated with fear learning, not due to learning the contingencies of fixed ITIs.

      Reviewer #2 (Public Review):

      In this study, the authors aim to understand how neurons in the anterior insular cortex (insula) modulate fear behaviors. They report that the activity of a subpopulation of insula neurons is positively correlated with freezing behaviors, while the activity of another subpopulation of neurons is negatively correlated to the same freezing episodes. They then used optogenetics and showed that activation of anterior insula excitatory neurons during tones predicting a footshock increases the amount of freezing outside the tone presentation, while optogenetic inhibition had no effect. Finally, they found that two neuronal projections of the anterior insula, one to the amygdala and another to the medial thalamus, are increasing and decreasing freezing behaviors respectively. While the study contains interesting and timely findings for our understanding of the mechanisms underlying fear, some points remain to be addressed.

      We are thankful for the detailed and constructive comments by the reviewer and addressed the points. Specifically, we included possible limitations of using only male mice in the study, included two more studies about the insula as references, specified the L-ratio and isolated distance used in our study, added the ratio of putative-excitatory and putative-inhibitory neurons obtained from our study, changed the terms used to describe neuronal activity changes (freezing-excited and freezing-inhibited cells), added new analysis (Figure 2H), rearranged Figure 2 for clarity, added new histology images, and added atlas maps with viral expressions (three figure supplements).

      Reviewer #1 (Recommendations For The Authors):

      - I would suggest keeping the same y-axis for all figures that display the same data type - Figure 5D, for example.

      Thank you for the detailed suggestion. We corrected the y-axis that display the same data type to be the same for all figures.

      - In the methods, it says 30s bins were used for neural analysis (line 435). I cannot imagine doing this, and looking at the other figures, it does not look like this is the case so could you please clarify what bins, averages, etc were used for neural and behavioral analysis?

      Bin size for neural analysis varied; 30s, 5s, 1s bins were used depending on the analysis. We corrected this and specified what time bin was used for which figure in the methods.

      Bin size for neural and freezing behavior was 30s and we also added this to the methods.

      - I would not make any claims about the fear response here being associative/conditional. This would require a control group that received an equal number of tone and shock exposures, whether explicitly unpaired or random.

      The unpaired fear conditioning paradigm, unpaired tone and shock, suggested by the reviewer is well characterized not to induce fear behavior by CS (Moita et al., 2003 and Kochli et al., 2015). In addition, considerable evidence from immediate post-shock freezing and immediate post-shock context shift studies all indicate that the freezing response is an associative (and not nonassociative) response (Fanselow, 1980 and 1986; and Landeira-Fernandez et al., 2006). Thus, our animals' freezing response to the tone CS presentation in a novel context, following three tone CS-footshock US pairings, most likely reflects associative learning.

      - I appreciate the discussion about requiring some inference to conclude that anatomically defined neurons are the physiologically defined ones. This is a caveat that is fully disclosed, however, I might suggest adding to the discussion that future experiments could address this by tagging insula-thalamus or insula-amygdala neurons with antidromic (opto or even plain old electric!) stimulation. These experiments are tricky to perform, of course, but this would be required to fully close all the links between behavior, physiology, and anatomy.

      As suggested, we have included that, in a future study, we would like to elucidate a more direct link between physiology, anatomy and behaviors by optogenetically tagging the insula-thalamus/insula-amygdala neurons and identifying whether it may be a positive or a negative cell (now named the freezing-excited and freezing-inhibited cells, respectively) in the discussion.

      [Lines 371-375] We would like to provide a more direct evidence between the neuronal response types and projection patterns in future studies by electrophysiologically identifying freezing-excited and freezing-inhibited aIC neurons and testing whether those neurons activates to optogenetic activation of amygdala or medial thalamus projecting aIC neurons.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) As all experiments have been performed only in male mice, the authors need to clearly state this limit in the introduction, abstract, and title of the manuscript.

      With increasing number of readers becoming interested in the biological sex used in preclinical studies, we also feel that it should be mentioned in the beginning of the manuscript. As suggested, we explicitly wrote that we only used male mice in the title, abstract, and introduction. In addition, we discussed possible limitations of only using male mice in the discussion section as follows:

      [Lines 381-386] Another factor to consider is that we have only used male mice in this study. Although many studies report that there is no biological sex difference in cued fear conditioning (42), the main experimental paradigm used in this study, it does not mean that the underlying brain circuit mechanism would also be similar. The bidirectional fear modulation by aIC→medial thalamus or the aIC→amygdala projections may be different in female mice, as some studies report reduced cued fear extinction in females (42).

      (2) The authors are missing important publications reporting findings on the insular cortex in fear and anxiety. For example, the authors should cite studies showing that anterior insula VIP+ interneurons inhibition reduces fear memory retrieval (Ramos-Prats et al., 2022) and that posterior insula neurons are a state-dependent regulator of fear (Klein et al., 2021). Also, regarding the anterior insula to basolateral amygdala projection (aIC-BLA), the author should include recent work showing that this population encodes both negative valence and anxiogenic spaces (Nicolas et al., 2023). 

      We appreciate the detailed suggestions and we added appropriate publications in the discussion section. The anterior insula VIP+ interneuron study (Ramos-Prats et al., 2022) is interesting, but based on the evidence provided in the paper, we felt that the role of aIC VIP+ interneuron in fear conditioning is low. VIP+ interneurons in the aIC seem to be important in coding sensory stimuli, however, it’s relevance to conditioned stimuli seems to be low; overall VIP intracellular calcium activity to CS was low and did not differ between acquisition and retrieval. Also, inhibition of VIP did not influence fear acquisition. VIP inhibition during fear acquisition did reduce fear retrieval (CS only, no light stimulation), but this does not necessarily mean that VIP activity will be involved in fear memory storage or retrieval, especially because intracellular calcium activity of VIP+ neurons was low during fear conditioning and retrieval.

      Studies by Klein et al. (2021) and Nicolas et al. (2023) are integrated in the discussion section as follows.

      [Lines 297-301] Group activity of neurons in the pIC measured with fiberphotometry, interestingly, exhibited fear state dependent activity changes—decreased activity with high fear behavior and increased activity with lower fear behavior (29)—suggesting that group activity of the pIC may be involves in maintain appropriate level of fear behavior.

      [Lines 316-319] Another distinction between the aIC and pIC may be related with anxiety, as a recent study showed that group activity of aIC neurons, but not that of the pIC, increased when mice explored anxiogenic space (open arms in an elevated plus maze, center of an open field box) (32).

      (3) The authors should specify how many neurons they excluded after controlling the L-ratio and isolation distance. It is also important to specify the percentage of putative excitatory and inhibitory interneurons recorded among the 11 mice based on their classification (the number of putative inhibitory interneurons in Figure 1D seems too low to be accurate).

      We use manual cluster cutting and only cut clusters that are visually well isolated. So we hardly have any neurons that are excluded after controlling for L-ratio and isolation distance. The criterion we used was L-ratio<0.3 and isolation distance>15, and we specified this in the methods as follows.

      [Lines 454-458] We only used well-isolated units (L-ratio<0.3, isolation distance>15) that were confirmed to be recorded in the aIC (conditioned group: n = 116 neurons, 11 mice; control group: n = 14 neurons, 3 mice) for the analysis (46). The mean of units used in our analysis are as follows: L-ratio = 0.09 ± 0.012, isolation distance = 44.97 ± 5.26 (expressed as mean ± standard deviation).

      As suggested, we also specified the percentage of putative excitatory and inhibitory interneurons recorded from our study in the results and methods section. The relative percentage of putative excitatory and inhibitory interneurons were similar for both the conditioned and the control groups (conditioned putative-excitatory: 93.1%, putative-inhibitory: 6.9%; control putative-excitatory: 92.9%, putative-inhibitory: 7.1%). Although the number of putative-interneurons isolated from our recordings is low that is what we obtained. Putative inhibitory neurons, probably because of their relatively smaller size, has a tendency to be underrepresented than the putative excitatory cells.

      [Lines 83-87] Of the recorded neurons, we analyzed the activity of 108 putative pyramidal neurons (93% of total isolated neurons) from 11 mice, which were distinguished from putative interneurons (n = 8 cells, 7% of total isolated neurons) based on the characteristics of their recorded action potentials (Figure 1D; see methods for details).

      [Lines 464-467] The percentage of putative excitatory neurons and putative inhibitory interneurons obtained from both groups were similar (conditioned putative-excitatory: 93.1%, putative-inhibitory: 6.9%; control putative-excitatory: 92.9%, putative-inhibitory: 7.1%).

      (4) While the use of correlation of single-unit firing frequency with freezing is interesting, classically, studies analyze the firing in comparison to the auditory cues. If the authors want to keep the correlation analysis with freezing, rather than correlations to the cues, they should rename the cells as "freezing excited" and "freezing inhibited" cells instead of positive and negative cells.

      As suggested, we used the terms “freezing-excited” and “freezing-inhibited” cells instead of positive and negative cells.

      (5) To improve clarity, Figure 2 should be reorganized to start with the representative examples before including the average of population data. Thus Panel D should be the first one. The authors should also consider including the trace of the firing rate of these representative units over time, on top of the freezing trace, as well as Pearson's r and p values for both of them. Then, the next panels should be ordered as follows: F, G, H, C, A, B, I, and finally E.

      We have rearranged Figure 2 based on the suggestions.

      (6) It is unclear why the freezing response in Figure 2 is different in current panels F, G, and H. Please clarify this point.

      It was because the freezing behaviors of slightly different population of animals were averaged. Some animals did not have positive/negative (or both) cells and only the behavior of animals with the specified cell-type were used for calculating the mean freezing response. With rearrangement of Figure 2, now we do not have plots with juxtaposed mean neuronal response-types and behavior.

      (7) Even though the peak of tone-induced firing rate change between negative and positive cells is 10s later for positive cells, the conclusion that this 'difference suggests differential circuits may regulate the activities of different neuron types in response to fear' is overstating the observation. This statement should be rephrased. Indeed, it could be the same circuits that are regulated by different inputs (glutamatergic, GABA, or neuromodulatory inputs).

      We agree and delete the statement from the manuscript.

      (8) The authors mention they did not find tone onset nor tone offset-induced responses of anterior insula neurons. It would be helpful to represent this finding in a Figure, especially, which were the criteria for a cell to be tone onset or tone offset responding.

      We added how tone-onset and tone-offset were analyzed in the methods section and added a plot of the analysis in Figure 2H.

      (9) Based on the spread of the viral expression shown in Figure 3B, it appears that the authors are activating/inhibiting insula neurons in the GI layer, whereas single-unit recordings report the electrodes were located in DI, AID, and AIV layers. The authors should provide histology maps of the viral spread for ChR2, NpHR3, and eYFP expression.

      Thank you for the excellent suggestion. Now the histological sample in Figure 3B is a sample with expression in the GI/DI/AID layers and it also has an image taken at higher resolution (x40) to show that viral vectors are expressed inside neurons. We also added histological maps with overlay of viral expression patterns of the ChR2, eYFP, and NpHR3 groups in Figure 3—figure supplement 1.

      (10) In Figure 5B, the distribution of terminals expressing ChR2 appears much denser in CM than in MD. This should be quantified across mice and if consistent with the representative image, the authors should refer to aIC-CM rather than aIC-MD terminals.

      Overall, we referred to the connection as aIC-medial thalamus, which collectively includes both the CM and the MD. Microscopes we have cannot determine whether terminals end at the CM or MD, but the aIC projections seems to pass through the CM to reach the MD. The Allen Brain Institute’s Mouse brain connectivity map (https://connectivity.brain-map.org/projection/experiment/272737914) of a B6 mouse, the mouse strain we used in our study, with tracers injected in similar location as our study also supports our speculation and shows that aIC neuronal projections terminate more in the MD than in the CM. In addition, the power of light delivered for optogenetic manipulation is greatly reduced over distance, and therefore, the MD projecting terminals which is closer to the optic fiber will be more likely to be activated than the CM projecting terminals. However, since we could not determine whether the aIC terminate at the CM or the MD, we collectively referred to the connection as the aIC-medial thalamus throughout the manuscript.

      Author response image 1.

      (11) Histological verifications for each in vivo electrophysiology, optogenetic, and tracing experiments need to include a representative image of the implantation/injection site, as well as a 40x zoom-in image focusing on the cell bodies or terminals right below the optic fiber (for optogenetic experiments). Moreover, an atlas map including all injection locations with the spread of the virus and fiber placement should be added in the Supplement Figures for each experiment (see Figure S1 Klein et al., 2021). Similarly, the authors need to add a representation of the spread of the retrograde tracers for each mouse used for this tracing experiment.

      As suggested, we added a histology sample showing electrode recording location for in-vivo electrophysiology in Figure 1 and added atlas maps for the optogenetic and tracing experiments in supplementary figures. We also provide a 40x zoom-in image of the expression pattern for the optogenetic experiments (Figure 3B).

      (12) To target anterior insula neurons, authors mention coordinates that do not reach the insula on the Paxinos atlas (AP: +1.2 mm, ML: -3.4 mm, DV: -1.8 mm). If the DV was taken from the brain surface, this has to be specified, and if the other coordinates are from Bregma, this also needs to be specified. Finally, the authors cite a review from Maren & Fanselow (1996), for the anterior insula coordinates, but it remains unclear why.

      AP and ML coordinates are measurement made in reference to the bregma. DV was calculated from the brain surface. We specified these in the Methods. We did not cite a review from Maren & Fenselow for the aIC coordinates.

      Minor comments:

      (1) A schematic of the microdrive and tetrodes, including the distance of each tetrode would also be helpful.

      We used a handcrafted Microdrives with four tetrodes. Since they were handcrafted, the relative orientation of the tetrodes varies and tetrode recording locations has to be verified histologically. We, however, made sure that the distance between tetrodes to be more than 200 μm apart so that distinct single-units will be obtained from different tetrodes. We added this to the methods as follows.

      [Lines 430-431] The distance between the tetrodes were greater than 200 μm to ensure that distinct single-units will be obtained from different tetrodes.

      (2) Figure 2E: representation of the baseline firing (3-min period before the tone presentation) is missing.

      Figure 2E is the 3 min period before tone presentation

      (3) Figure 2: Averages Pearson's correlation r and p values should be stated on panels F, G, and H (positive cell r = 0.81, P < 0.05; negative cell r = -0.68, P < 0.05).

      They were all originally stated in the figures. But with reorganization of Figure 2, we now have a plot of the Pearson’s Correlation with r and p values in Figure 2F.

      (4) Figure 2I: Representation of the absolute value of the normalized firing is highly confusing. Indeed, as the 'negative cells' are inhibited to freezing, firing should be represented as normalized, and negative for the inhibited cells.

      To avoid confusion, we did not take an absolute value of the “negative cells”, which are now called the “freezing-inhibited cells”.

      (5) Figure 4E (retrograde tracing): representation of individual values is missing.

      Figure 4E now has individual values.

      References:

      London, T. D., Licholai, J. A., Szczot, I., Ali, M. A., LeBlanc, K. H., Fobbs, W. C., & Kravitz, A. V. (2018). Coordinated ramping of dorsal striatal pathways preceding food approach and consumption. Journal of Neuroscience, 38(14), 3547-3558.

      Trott, J. M., Hoffman, A. N., Zhuravka, I., & Fanselow, M. S. (2022). Conditional and unconditional components of aversively motivated freezing, flight and darting in mice. Elife, 11, e75663.

      Fanselow, M. S. (1980). Conditional and unconditional components of post-shock freezing. The Pavlovian journal of biological science: Official Journal of the Pavlovian, 15(4), 177-182.

      Fanselow, M. S. (1986). Associative vs topographical accounts of the immediate shock-freezing deficit in rats: implications for the response selection rules governing species-specific defensive reactions. Learning and Motivation, 17(1), 16-39.

      Landeira-Fernandez, J., DeCola, J. P., Kim, J. J., & Fanselow, M. S. (2006). Immediate shock deficit in fear conditioning: effects of shock manipulations. Behavioral neuroscience, 120(4), 873.

      Moita, M. A., Rosis, S., Zhou, Y., LeDoux, J. E., & Blair, H. T. (2003). Hippocampal place cells acquire location-specific responses to the conditioned stimulus during auditory fear conditioning. Neuron, 37(3), 485-497.

      Kochli, D. E., Thompson, E. C., Fricke, E. A., Postle, A. F., & Quinn, J. J. (2015). The amygdala is critical for trace, delay, and contextual fear conditioning. Learning & memory, 22(2), 92-100.

      Ramos-Prats, A., Paradiso, E., Castaldi, F., Sadeghi, M., Mir, M. Y., Hörtnagl, H., ... & Ferraguti, F. (2022). VIP-expressing interneurons in the anterior insular cortex contribute to sensory processing to regulate adaptive behavior. Cell Reports, 39(9).

      Klein, A. S., Dolensek, N., Weiand, C., & Gogolla, N. (2021). Fear balance is maintained by bodily feedback to the insular cortex in mice. Science, 374(6570), 1010-1015.

      Nicolas, C., Ju, A., Wu, Y., Eldirdiri, H., Delcasso, S., Couderc, Y., ... & Beyeler, A. (2023). Linking emotional valence and anxiety in a mouse insula-amygdala circuit. Nature Communications, 14(1), 5073.

      Maren, S., & Fanselow, M. S. (1996). The amygdala and fear conditioning : Has the nut been cracked? Neuron, 16(2), 237‑240. https://doi.org/10.1016/s0896-6273(00)80041-0

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The main goal of the authors was to study the testis-specific role of the protein FBXO24 in the formation and function of the ribonucleoprotein granules (membraneless electron-dense structures rich in RNAs and proteins).

      We appreciate the summary comment of reviewer #1.

      Strengths:

      The wide variety of methods used to support their conclusions (including transgenic models)

      We appreciate the positive comment of reviewer #1.

      Weaknesses:

      The lack of specific antibodies against FBXO24. Some of the experiments showing a specific phenotype are descriptive and lack of logical explanation about the possible mechanism (i.e. AR or the tail structure).

      Because we could not obtain specific antibodies against FBXO24, we generated Fbxo24-FLAG transgenic mice, which can be used to show the interaction between FBXO24 and IPO5. For the mechanism of impaired acrosome reaction, we added some results and discussion as written in the response to the question (1) of reviewer #1 (public review). For the mechanism of abnormal flagellar structure, we added new results and fixed the manuscript as written in the response to the major comments of reviewer #3 (recommendations for the authors).

      Questions:

      The paper is excellent and employs a wide variety of methods to substantiate the conclusions. I have very few questions to ask:

      (1) KO mice cannot undergo acrosome reaction (AR) even spontaneously. How do you account for this, given that no visible defects were observed in the acrosome?

      One possibility is that Fbxo24 KO spermatozoa cannot undergo capacitation; however, it is difficult to analyze the capacitation status such as tyrosine phosphorylation because most Fbxo24 KO spermatozoa are not alive (Figure S3A). Other possibility is that AR-related proteins are affected in Fbxo24 KO spermatozoa. Therefore, we analyzed the amounts of AR-related proteins with mass spectrometry (Figure S3C). Although previous studies indicate that the assembly of the SNARE complex is a key event prior to AR [Hutt et al., 2005 (PMID: 15774481); Katafuchi et al., 2000 (PMID: 11066067); Schulz et al., 1997 (PMID: 9356173); Tomes et al., 2002 (PMID: 11884041)], no clear differences were detected for SNARE proteins (Figure S3C and D). PLCD4 that is important for AR [Fukami et al., 2001 (PMID: 11340203)) was also detected in Fbxo24 KO spermatozoa (Figure S3C). Although we could not find differences in the amounts of AR-related proteins, it is still possible that FER1L5, another AR-related protein [Morohoshi et al., 2023 (PMID: 36696506)] not detected in the mass spectrometry analyses, or AR-related proteins not yet identified are affected in Fbxo24 KO spermatozoa. We added these results and discussion (line 160-166 and 305-312).

      (2) KO sperm are unable to migrate in the female tract, and, more intriguingly, they do not pass through the utero-tubal junction (UTJ). The levels of ADAM3 are normal, suggesting that the phenotype is influenced by other factors. The authors should investigate the levels of Ly6K since mice also exhibit the same phenotype but with normal levels of ADAM3.

      We detected LY6K in Fbxo24 KO spermatozoa with immunoblotting, but no difference was found.

      We added the results (Figure S3E and line 172–175).

      (3) In Figure 4A, the authors assert that "RBGS Tg mice revealed that mitochondria were abnormally segmented in Fbxo24 KO spermatozoa." I am unable to discern this from the picture shown in that panel. Could you please provide a more detailed explanation or display the information more explicitly?

      We are sorry for the ambiguous explanation on the morphology of sperm mitochondria sheath. Fbxo24 KO cauda epidydimal spermatozoa shows disorganized mitochondria sheath rather than “segmented”. We fixed the sentence (line 190-192) and added white arrowheads that indicate the disorganized regions (Figure 4A).

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kaneda et al "FBXO24 ensures male fertility by preventing abnormal accumulation of membraneless granules in sperm flagella" is a significant paper on the role of FBXO24 in murine male germ cell development and sperm ultrastructure and function. The body of experimental evidence that the authors present is extraordinarily strong in both breadth and depth. The authors investigate the protein's functions in male germ cells and sperm using a wide variety of approaches but focusing predominantly on their novel mouse model featuring deletion of the Fbxo24 gene and its product. Using this mouse, and a cross of it with another model that expresses reporters in the head and midpiece, they logically build from one experiment to the next. Together, their data show that this protein is involved in the regulation of membraneless electron-dense structures; loss of FBXO24 led to an accumulation of these materials and defects in the sperm flagellum and fertilizing ability. Interestingly, the authors found that several of the best-known components of electron-dense ribonucleoprotein granules that are found in the intermitochondrial cement and chromatoid body were not disrupted in the Fbxo24 knockout, suggesting that the electron-dense material and these structures are not all the same, and the biology is more complicated than some might have thought. They found evidence for the most changes in IPO5 and KPNB1, and biochemical evidence that FBXO24 and IPO5 could interact.

      We appreciate the summary comment of reviewer #2.

      Strengths:

      The authors are to be commended for the thoroughness of their experimental approaches and the extent to which they investigated impacts on sperm function and potential biochemical mechanisms. Very briefly, they start by showing that the Fbxo24 message is present in spermatids and that the protein can interact with SKP1, in a way that is dependent on its F-box domain. This points toward a potential function in protein degradation. To test this, they next made the knockout mouse, validated it, and found the males to be sterile, although capable of plugging a female. Looking at the sperm, they identified a number of ultrastructural and morphological abnormalities, which they looked at in high resolution using TEM. They also cross their model with RBGS mice so that they have reporters in both the acrosome and mitochondria. The authors test a variety of sperm functions, including motility parameters, ability to fertilize by IVF, cumulus-free IVF, zona-free-IVF, and ICSI. They found that ICSI could rescue the knockout but not other assisted reproductive technologies. Defects in male fertility likely resulted from motility disruption and failure to get through the utero-tubal junction but defects in acrosome exocytosis also were noted. The authors performed thorough investigations including both targeted and unbiased approaches such as mass spectrometry. These enabled them to show that although the loss of the FBXO24 protein led to more RNA and elevated levels of some proteins, it did not change others that were previously identified in the electron-dense RNP material.

      The manuscript will be highly significant in the field because the exact functions of the electron-dense RNP materials have remained somewhat elusive for decades. Much progress has been made in the past 15 years but this work shows that the situation is more complex than previously recognized. The results show critical impacts of protein degradation in the differentiation process that enables sperm to change from non-descript round cells into highly polarized and compartmentalized mature sperm, with an equally highly compartmentalized flagellum. This manuscript also sets a high bar for the field in terms of how thorough it is, which reveals wide-ranging impacts on processes such as mitochondrial compaction and arrangement in the midpiece, the correct building of the major cytoskeletal elements in the flagellum, etc.

      We appreciate the positive comment of reviewer #2.

      Weaknesses:

      There are no real weaknesses in the manuscript that result from anything in the control of the authors. They attempted to rescue the knockout by expressing a FLAG-tagged Fbxo24 transgene, but that did not rescue the phenotype, either because of inappropriate levels/timing/location of expression, or because of interference by the tag. They also could not make anti-FBXO24 that worked for coimmunoprecipitation experiments, so relied on the FLAG epitope, an approach that successfully showed co-IP with IPO5 and SKP1.

      We could not rescue the phenotype with Fbxo24-FLAG transgene, but different Fbxo24 mutant mice show the same phenotypes (Figure S6G). Further, another group showed that Fbxo24 KO mice exhibited abnormal mitochondrial coiling [Li et al., 2024 (PMID: 38470475)], confirming that

      FBXO24 is involved in the mitochondrial sheath formation.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors found that FBXO24, a testis-enriched F-box protein, is indispensable for male fertility. Fbxo24 KO mice exhibited malformed sperm flagellar and compromised sperm motility.

      We appreciate the summary comment of reviewer #3.

      Strengths:

      The phenotype of Fbxo24 KO spermatozoa was well analyzed.

      We appreciate the positive comment of reviewer #3.

      Weaknesses:

      The authors observed numerous membraneless electron-dense granules in the Fbxo24 KO spermatozoa. They also showed abnormal accumulation of two importins, IPO5 and KPNB1, in the Fbxo24 KO spermatozoa. However, the data presented in the manuscript do not support the conclusion that FBXO24 ensures male fertility by preventing the abnormal accumulation of membraneless granules in sperm flagella, as indicated in the manuscript title.

      Fbxo24 KO mice showed abnormal accumulation of membraneless granules in sperm flagella and male infertility, suggesting that FBXO24 is involved in these processes, but there are no results that show the direct relationship as reviewer #3 mentioned. Therefore, we fixed the title.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      On page 4, lines 152-154, the authors introduce the RBGS mouse model and use it in their experiments.

      However, they left out an obvious but helpful sentence that tells the reader that they crossed the Fbxo24-null mouse with the RBGS. As one continues reading it is clear, but best to avoid even slight confusion.

      We revised the explanation in the result section (line 150-153).

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, the authors found that FBXO24, a testis-enriched F-box protein, is indispensable for male fertility. Fbxo24 KO mice exhibited malformed sperm flagellar and compromised sperm motility. The phenotype of Fbxo24 KO spermatozoa was well analyzed.

      The authors observed numerous membraneless electron-dense granules in the Fbxo24 KO spermatozoa. They also showed abnormal accumulation of two importins, IPO5 and KPNB1, in the Fbxo24 KO spermatozoa. However, the data presented in the manuscript do not support the conclusion that FBXO24 ensures male fertility by preventing the abnormal accumulation of membraneless granules in sperm flagella, as indicated in the manuscript title.

      Fbxo24 KO mice showed abnormal accumulation of membraneless granules in sperm flagella and male infertility, suggesting that FBXO24 is involved in these processes, but there are no results that show the direct relationship as reviewer #3 mentioned. Therefore, we fixed the title.

      Major comments:

      In the title, abstract, introduction, and some sections such as lines 275-276, the authors conclude that FBXO24 prevents the accumulation of importins and RNP granules during spermiogenesis. However, the provided data do not substantiate this claim. To provide conclusive evidence to support the current title, the authors need to present evidence supporting: 1) direct degradation of IPO5 and KPNB1 by FBXO24; 2) the direct requirement of IPO5 for the formation of the membraneless granules, and 3) infertility resulting from the presence of membraneless granules, rather than other issues such as abnormal ODF and AX.

      (1) direct degradation of IPO5 and KPNB1 by FBXO24.

      To examine if IPO5 can be degraded by FBXO24, we performed a ubiquitination assay using HEK293T cells. Ubiquitination of IPO5 was upregulated in the presence of WT FBXO24 but not with the mutant ΔF-box FBXO24, suggesting that IPO5 can be ubiquitinated by FBXO24. We did not examine the ubiquitination of KPNB1 because we failed to construct a plasmid vector expressing mouse KPNB1. We think that KPNB1 is not the substrate because we did not detect the interaction between FBXO24 and KPNB1 (Figure 5E). We added the results of the ubiquitination assay (Figure

      5F and line 261-265) and mentioned it in the abstract (line 35).

      (2) the direct requirement of IPO5 for the formation of the membraneless granules.

      (3) infertility resulting from the presence of membraneless granules, rather than other issues such as abnormal ODF and AX.

      We revealed that IPO5 aggregate under stress condition in COS7 cells (Figure 6C and D); however, we did not examine whether IPO5 is required for the formation of the membraneless granules. We consider that protein degradation systems such as PROTAC or Trim-Away to knockdown IPO5 at the protein level in Fbxo24 KO mice could be a good way to see if the membraneless granules are diminished and male fertility is rescued. However, it takes time to apply the degradation systems in vivo. Therefore, we would like to leave this rescue experiment for future studies. We fixed the title and  abstract (line 37-38), and removed the last sentence of the introduction.

      Also, the other group reported the analyses of Fbxo24 KO mice [Li et al., 2024 (PMID: 38470475)] right after we submitted our manuscript to the eLife. They reported not only disorganized flagellar structures but also abnormal head morphology, which may lead to male infertility. The differences from our study may be due to different mouse genetic backgrounds. We mentioned it in the discussion section (line 348-353).

      Minor comments:

      (1) The authors claimed a significant increase in the total amount of RNAs in Fbxo24 KO spermatozoa (lines 259-261), suggesting that the ...contain RNAs. More direct evidence supporting this claim should be provided.

      We show that the amounts of IPO5 and KBNB1 increased in Fbxo24 KO spermatozoa (Figure 5A and B), both of which could be incorporated into RNP granules in COS7 cells (Figure 6C and D), supporting the idea that membraneless electron-dense structures may be RNP granules. However, because we did not show direct evidence that electron-dense structures contain RNAs, we removed the sentences (line 259-261 of the 1st submission manuscript). 

      (2) The author should provide an explanation for the absence of a FLAG band in the input Tg in Figure 5D and the larger size of the IPO5 band in the FLAG-IP group compared to the input. Similar observations are also noted in Figure 5E.

      The FLAG band is weak because the protein amount is low. When we increase the contrast, we can see the FLAG band. We added an image with high contrast (Figure 5D). Sometimes, proteins run differently with SDS-PAGE after immunoprecipitation, likely due to varying protein composition in the sample. We explained it in the figure legend (line 868-869).

      (3) In Line 526, clarify the procedure for sperm purification, and determine the potential for contamination from somatic cells.

      We did not perform sperm purification, but when we observed spermatozoa obtained from cauda epididymis, we rarely observed either somatic cells or immature spermatogenic cells. We added  pictures in Figure S7. Further, we added detailed explanation about how to collect spermatozoa from the epididymis (line 549-550).

      (4) Define the Y-axis in Figure 2E, F, and G.

      We have revised the figures.

    1. Author response:

      Reviewer #1 (Public Review):

      Using the UK Biobank, this study assessed the value of nuclear magnetic resonance measured metabolites as predictors of progression to diabetes. The authors identified a panel of 9 circulating metabolites that improved the ability in risk prediction of progression from prediabetes to diabetes. In general, this is a well-performed study, and the findings may provide a new approach to identifying those at high risk of developing diabetes. I have some comments that may improve the importance of this study.

      We deeply appreciate the reviewer's invaluable time dedicated to the review of this manuscript and the insightful comments to enhance its overall quality.

      (1) It is unclear why the authors only considered the top 20 variables in the metabolite selection and why they did not set a wider threshold.

      Thank you for the comment. We set the top 20 variables in the metabolite selection balancing the performance of the final diabetes risk prediction model and the clinical applicability due to measurement costs. We have added this explanation in the “Methods” section.

      “We chose the intersection set of the top 20 most important variables selected by the three machine learning models, after balancing the performance of the final diabetes risk prediction model and the clinical applicability associated with measurement costs of metabolites.”

      (2) The methods section would benefit from a more detailed exposition of how parameter tuning was conducted and the range of parameters explored during the training of the RSF model.

      According to the reviewer’s suggestion, we have added a more detailed description of parameters tunning and the range of parameters explored during the training of the RSF model in the “Method S2” section in the Supplementary material.

      “The RSF model was fitted using the “randomForestSRC” package and the grid search method was used for hyperparameter tuning. Specifically, the grid search method was used to tune hyperparameters among the RSF model, through minimizing out-of-sample or out-of-bag error1. Each tree in the RSF is constructed from a random sample of the data, typically a bootstrap sample or 63.2% of the sample size (as in the present study). Consequently, not all observations are used to construct each tree. The observations that are not used in the construction of a tree are referred to as out-of-bag observations. In an RSF model, each tree is built from a different sample of the original data, so each observation is “out-of-bag” for some of the trees. The prediction for an observation can then be obtained using only those trees for which the observation was not used for the construction. A classification for each observation is obtained in this way and the error rate can be estimated from these predictions. The resulting error rate is referred to as the out-of-bag error. Through calculating the out-of-bag error in each iteration, the best hyperparameters were finally determined.

      The hyperparameters to be tuned and range of grid search in the present study were below: number of trees (50-1000, by 50), number of variables to possibly split at each node (3-6, by 1), and minimum size of terminal node (1-20, by 1)2.”

      (3) It is hard to understand the meaning of the decision curve analysis and the clinical implications behind the net benefit, which are required to clarify the application values of models.

      Thank you for the comment. We have added more description and discussion about the decision curve analysis in the “Methods” and “Discussion” sections.

      “Furthermore, we used decision curve analysis (DCA) to assess the clinical usefulness of prediction model-based guidance for prediabetes management, which calculates a clinical “net benefit” for one or more prediction models in comparison to default strategies of treating all or no patients3.”

      “Most importantly, a model with good discrimination does not necessarily have high clinical value. Hence, DCA was used to compare the clinical utility of the model before and after adding the metabolites, and this showed a higher net benefit for the latter than the basic model, suggesting the addition of the metabolites increased the clinical value of prediction, i.e., the potential benefit of guiding management in individuals with prediabetes3,4. These results provided novel evidence supporting the value of metabolic biomarkers in risk prediction and stratification for the progression from prediabetes to diabetes.”

      (4) Notably, the NMR platform utilized within the UK Biobank primarily focused on lipid species. This limitation should be discussed in the manuscript to provide context for interpreting the results and acknowledge the potential bias from the measuring platform.

      Thank you for the comment. We acknowledged this limitation that NMR platform within the UK Biobank primarily focused on lipid species and the potential bias from the measuring platform and have added this in “Discussion” section.

      “Third, the Nightingale metabolomics platform primarily focused on lipids and lipoprotein sub-fractions, and thus the predictive value of other metabolites in the progression from prediabetes to diabetes warranted further research using an untargeted metabolomics approach.”

      (5) The manuscript should explain the potential influence of non-fasting status on the findings, particularly concerning lipoprotein particles and composition. There should be a detailed discussion of how non-fasting status may impact the measurement and the findings.

      According to the reviewer’s suggestion, we have added more details to explain the potential influence of non-fasting status on our findings in the “Discussion” section.

      “Additionally, the use of non-fasting blood samples might increase inter-individual variation in metabolic biomarker concentrations, however, fasting duration has been reported to account for only a small proportion of variation in plasma metabolic biomarker concentrations5. Therefore, we believe the impact of non-fasting samples on our findings would be minor.”

      (6) Cross-platform standardization is an issue in metabolism, and further descriptions of quality control are recommended.

      Thank you for the comment. We have added more description of quality control in the “Method S1” section in the Supplementary material.

      “Metabolic biomarker profiling by Nightingale Health’s NMR platform provides consistent results over time and across spectrometers. Furthermore, the sample preparation is minimal in the Nightingale Health’s metabolic biomarker platform, circumventing all extraction steps. These aspects result in highly repeatable biomarker measurements. Pre-specified quality metrics were agreed between UK Biobank and Nightingale Health to ensure consistent results across the samples, and pilot measurements were conducted. Nightingale Health performed real-time monitoring of the measurement consistency within and between spectrometers throughout the UK Biobank samples. Two control samples provided by Nightingale Health were included in each 96-well plate for tracking the consistency across multiple spectrometers. Furthermore, two blind duplicate samples provided by the UK Biobank were included in each well plate, with the position information unlocked only after results delivery. Coefficient of variation (CV) targets across the metabolic biomarker profile were pre-specified for both Nightingale Health’s internal control samples and UK Biobank’s blind duplicates. The targets were met for each consecutively measured batch of ~25,000 samples. For the majority of the metabolic biomarkers, the CVs were below 5% (https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=3000). Further, the distributions of measured biomarkers from 5 sample batches indicated absence of batch effects (https://biobank.ctsu.ox.ac.uk/ukb/ukb/docs/nmrm_app1).”

      Reviewer #2 (Public Review):

      Deciphering the metabolic alterations characterizing the prediabetes-diabetes spectrum could provide early time windows for targeted preventive measures to extend precision medicine while avoiding disproportionate healthcare costs. The authors identified a panel of 9 circulating metabolites combined with basic clinical variables that significantly improved the prediction from prediabetes to diabetes. These findings provided insights into the integration of these metabolites into clinical and public health practice. However, the interpretation of these findings should take account of the following limitations.

      We appreciate the reviewer’s positive comments and encouragement.

      (1) First, the causal relationship between identified metabolites and diabetes or prediabetes deserves to be further examined particularly when the prediabetic status was partially defined. Some metabolites might be the results of prediabetes rather than the casual factors for progression to diabetes.

      Thank you for your insightful comments. We agree with you that the panel of metabolites in this study might not be the causal factor for progression from prediabetes to diabetes, which needs further validation in experimental studies. We have added this limitation in the “Discussion” section.

      “Fifth, we could not draw any conclusion about the causality between the identified metabolites and the risk for progression to diabetes due to the observational nature, which remained to be validated in further experimental studies.”

      (2) The blood samples were taken at random (not all in a non-fasting state) and so the findings were subjected to greater variability. This should be discussed in the limitations.

      According to the reviewer’s suggestion, we have added more details to explain the potential influence of non-fasting status on our findings in the “Discussion” section.

      “Additionally, the use of non-fasting blood samples might increase inter-individual variation in metabolic biomarker concentrations, however, fasting duration has been reported to account for only a small proportion of variation in plasma metabolic biomarker concentrations5. Therefore, we believe the impact of non-fasting samples on our findings would be minor.”

      (3) The strength of NMR in metabolic profiling compared to other techniques (i.e., mass spectrometry [MS], another commonly used metabolic profiling method) could be added in the Discussion section.

      According to the reviewer’s suggestion, we have added the strength of NMR in metabolic profiling compared to other techniques in the “Discussion” section.

      “Circulating metabolites were quantified via NMR-based metabolome profiling within the UK Biobank, which offers metabolite qualification with relatively lower costs and better reproducibility6.”

      (4) Fourth, the applied platform focuses mostly on lipid species which may be a limitation as well.

      Thank you for the comment. We acknowledged this limitation that NMR platform within the UK Biobank primarily focused on lipid species and the potential bias from the measuring platform and have added this in the “Discussion” section.

      “Third, the Nightingale metabolomics platform primarily focused on lipids and lipoprotein sub-fractions, and thus the predictive value of other metabolites in the progression from prediabetes to diabetes warranted further research using an untargeted metabolomics approach.”

      (5) it is a very large group with pre-diabetes, but the results only apply to prediabetes and not to the general population. This should be clear, although the authors have also validated the predictive value of these metabolites in the general population.

      Thank you for the comment. We agree with you that the results only apply to prediabetes and not to the general population, though they also showed potential predictive value among participants with normoglycemia. We have accordingly modified the relevant expressions in the “Conclusion” section to restrict these findings to participants with prediabetes.

      “In this large prospective study among individuals with prediabetes, we detected a panel of circulating metabolites that were associated with an increased risk of progressing to diabetes.”

      References

      (1) Janitza S, Hornung R. On the overestimation of random forest's out-of-bag error. PLoS One. 2018;13(8):e0201904.

      (2) Tian D, Yan HJ, Huang H, et al. Machine Learning-Based Prognostic Model for Patients After Lung Transplantation. JAMA Netw Open. 2023;6(5):e2312022.

      (3) Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res. 2019;3:18.

      (4) Li J, Xi F, Yu W, Sun C, Wang X. Real-Time Prediction of Sepsis in Critical Trauma Patients: Machine Learning-Based Modeling Study. JMIR Form Res. 2023;7:e42452.

      (5) Li-Gao R, Hughes DA, le Cessie S, et al. Assessment of reproducibility and biological variability of fasting and postprandial plasma metabolite concentrations using 1H NMR spectroscopy. PLoS One. 2019;14(6):e0218549.

      (6) Geng T-T, Chen J-X, Lu Q, et al. Nuclear Magnetic Resonance–Based Metabolomics and Risk of CKD. American Journal of Kidney Diseases. 2023.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The current manuscript by Hajra et al deals with the role of the prominent Sirtuins SIRT1 and -3 during infection of macrophages with Salmonella Typhimurium (ST). Apparently, ST infection induces upregulation of host cell SRTs to aid its own metabolism during the intracellular lifestyle and to help reprogramming macrophage polarization. The manuscript has two parts, namely one part that deals with Salmonella infection in cells, where RAW 264.7 murine macrophage-like cells, sharing some features with primary macrophages, were employed. Infected RAW cells displayed a tendency to polarize towards wound-healing M2 and not inflammatory M1 macrophages, which was dependent on SRT. Consequently, the inflammatory response in RAW was more robust in the absence of SRT. Moreover, loss of SRTs leads to impaired bacterial proliferation in these cells, which was attributed to defects in metabolic adaption of the bacteria in the absence of SRT-activity and to the increased M1 inflammatory response.

      Unfortunately, the line of argumentation remains incomplete because corresponding assays in mice showed the opposite result as compared to the experiments using RAW 264.7 cells. i.e. loss of SRTs leads to increased bacterial load in animals (versus impaired proliferation in RAW 264.7 cells). The authors cannot explain this discrepancy.

      Strengths:

      Extensive analysis of Salmonella infection in RAW macrophage-like cells and mice in the context of SRT1/3 function.

      Weaknesses:

      Lack of connection between the cell-based and organismic data, which are not supportive of each other.

      We are highly grateful for your valuable and insightful comments. Thank you for appreciating the merit of our manuscript. We agree with the opposing phenotypes among the RAW264.7 cell line (Fig. 2A), primary peritoneal macrophages (ex vivo) (Fig.2B), and in vivo mouse model (Fig.8) findings. Both RAW264.7 macrophage and peritoneal macrophage infection show attenuated intracellular bacterial proliferation owing to the heightened proinflammatory burst. This is in sharp contrast to our in vivo mouse model of infection which shows increased organ burden and bacterial dissemination. The higher bacterial load in the organs including the spleen (Fig.8B) is attributed to increased pro-inflammatory cytokine burst and ROS production (Fig.8F-H, Fig.S9) triggering bacterial dissemination. The pro-inflammatory arsenals like IL-6, IL-1β and ROS that limit bacterial proliferation within the macrophages (F4/80+ macrophages within the spleen or in RAW264.7 macrophages or primary peritoneal macrophages) are facilitating bacterial dissemination in blood and to the other organs (Fig. 8I-L, Fig.S3F-G). This is in line with the following previous findings-

      Klebsiella pneumoniae infection triggers an inflammatory response via secretion of IL-6 upon HIF-1α activation that induces bacterial dissemination (Holden VI, Breen P, Houle S, Dozois CM, Bachman MA. Klebsiella pneumoniae Siderophores Induce Inflammation, Bacterial Dissemination, and HIF-1α Stabilization during Pneumonia. mBio. 2016 Sep 13;7(5):e01397-16. doi: 10.1128/mBio.01397-16. PMID: 27624128; PMCID: PMC5021805.).

      Correlation analysis of immune responses to Salmonella infection revealed that increased innate immune “cassette” opposes the adaptive immune arm leading to increased bacterial load in mice (Hotson AN, Gopinath S, Nicolau M, Khasanova A, Finck R, Monack D, et al. Coordinate actions of innate immune responses oppose those of the adaptive immune system during Salmonella infection of mice. Science signaling. 2016;9(410):ra4). 

      In our revised manuscript, we have assessed additional splenic populations including CD45+, Ly6C+, and CD11c+ populations. Our results show that the CD45+ splenic population depicts increased bacterial loads like that of the total splenic population within the SIRT1/3 inhibited cohorts. However, CD45+ monocytes and Ly6C positive splenic population exhibit compromised burden within the SIRT1/3 inhibited cohorts. Moreover, within the CD11c+ population, CD45+ granulocytes or lymphocytes show comparable organ loads to that of the vehicle control or SIRT1 activator-treated mice group (Fig. M-S, Fig.S8). Overall, our data suggest heterogeneous bacterial burden in diverse splenic populations.

      Reviewer #2 (Public Review):

      Dipasree Hajra et al demonstrated that Salmonella was able to modulate the expression of Sirtuins (Sirt1 and Sirt3) and regulate the metabolic switch in both host and Salmonella, promoting its pathogenesis. The authors found Salmonella infection induced high levels of Sirt1 and Sirt3 in macrophages, which were skewed toward the M2 phenotype allowing Salmonella to hyper-proliferate. Mechanistically, Sirt1 and Sirt3 regulated the acetylation of HIF-1alpha and PDHA1, therefore mediating Salmonella-induced host metabolic shift in the infected macrophages. Interestingly, Sirt1 and Sirt3-driven host metabolic switch also had an effect on the metabolic profile of Salmonella. Counterintuitively, inhibition of Sirt1/3 led to increased pathogen burdens in an in vivo mouse model. Overall, this is a well-designed study. There are a few comments below that would further strengthen the current study.

      Major comments:

      In the in vivo study (lines 436-446) - the authors noticed increased pathogen burden in the EX-527 or the 3TYP-treated mice cohorts but decreased pathogen burden within the F4/80+ macrophage population. What are the other cell types that have increased pathogen burden in splenocytes from EX-527 or the 3TYP treated? Can this be further explored and explained?

      While the authors indicated that IL-6 cytokine storm and elevated ROS production could result in bacterial dissemination in vivo, one could also argue that Sirt1/3 inhibitors might have an impact on gut function and/or gut microbiota (PMID: 22115311). Did Sirt1/3 inhibitors also lead to increased pathogen burdens in the gut? If so, the potential effect of these in vivo treatments on gut microbiota/colonization resistance should be discussed.

      Minor comment:

      Sirt1 has been shown to be degraded during Salmonella infection (PMID: 28192515), which is different from the current study. An explanation should be provided for this.

      We thank you for your encouraging and gracious comments. We deeply appreciate your time and efforts in providing constructive feedback for the betterment of our work. As per your precious suggestions, we have assessed additional splenic populations including CD45+, Ly6C+, and CD11c+ populations apart from F4/80+ macrophage populations. Our analysis suggests that the CD45+ splenic population show increased bacterial loads similar to the total splenic population within the SIRT1/3 inhibited cohorts. However, CD45+ monocytes and Ly6C positive splenic population exhibit compromised burden within the SIRT1/3 inhibited cohorts. Moreover, CD11c+ population, CD45+ granulocytes or lymphocytes show comparable organ loads to that of the vehicle control or SIRT1 activator treated mice group (Fig. 8M-S). Overall, our data suggest heterogeneous bacterial burden in diverse splenic populations.

      We immensely appreciate the reviewer for this insightful question about the effect of SIRT1/3 on the gut per se. To answer your question, we observed increased pathogen loads within the mesenteric lymph nodes of the gut in the SIRT1/3 inhibitor-treated mice groups (Fig.8B). In our revised manuscript, we evaluated gut inflammation via IL1-β estimation in the mice's ileal tissues and have observed heightened IL-1β production in the inhibitor-treated mice cohorts in comparison to the vehicle control (Fig. S3G). We have also examined gut epithelial pathology via Haematoxylin-Eosin (H&E) staining of the ileal sections to address the effect of in vivo treatment on gut microbiota and colonization resistance which is appended here. However, the gut microbiota crosstalk and their effect on colonization resistance is a part of another current study and it is being examined in detail there. Therefore, this appended H&E has not been incorporated in the revised manuscript.

      Author response image 1.

      In line with the reference PMID: 28192515, where Sirt1 has been shown to be degraded during Salmonella infection at later time points of infection, our study also has shown that both SIRT1 mRNA (Fig. 1A) and protein levels (Fig. S1A) show an elevated expression at 2h and 6h post-infection and show a downregulation at 16h in comparison to the 6h time point.  However, SIRT3 expression levels remain elevated even at later time points of infection. Therefore, we speculate that there is a shared role between SIRT1 and SIRT3 that facilitates the phenotypes reported in our study.

      Reviewer #3 (Public Review):

      Summary:

      In this paper, Hajra et al have attempted to identify the role of Sirt1 and Sirt3 in regulating metabolic reprogramming and macrophage host defense. They have performed gene knockdown experiments in RAW macrophage cell lines to show that depletion of Sirt1 or Sirt3 enhances the ability of macrophages to eliminate Salmonella Typhimurium. However, in mice, inhibition of Sirt1 resulted in dissemination of the bacteria but the bacterial burden was still reduced in macrophages. They suggest that the effect they have observed is due to increased inflammation and ROS production by macrophages. They also try to establish a weak link with metabolism. They present data to show that the switch in metabolism from glycolysis to fatty acid oxidation is regulated by acetylation of Hif1a, and PDHA1.

      Strengths:

      The strength of the manuscript is that the role of Sirtuins in host-pathogen interactions has not been previously explored in-depth making the study interesting. It is also interesting to see that depletion of either Sirt1 or Sirt3 results in a similar outcome.

      Weaknesses:

      The major weakness of the paper is the low quality of data, making it harder to substantiate the claims. Also, there are too many pathways and mechanisms being investigated. It would have been better if the authors had focussed on either Sirt1 or Sirt3 and elucidated how it reprograms metabolism to eventually modulate host response against Salmonella Typhimurium. Experimental evidence is also lacking to prove the proposed mechanisms. For instance, they show correlative data that the knockdown of Sirt1-mediated shift in metabolism is due to HIF1a acetylation but this needs to be proven with further experiments.

      We appreciate the reviewer’s critical analysis of our work. In the revised manuscript, we aimed to eliminate the low-quality data sets and have tried to substantiate them with better and conclusive ones, as directed in the recommendations for the author section. We agree with the reviewer that the inclusion of both Sirtuins 1 and 3 has resulted in too many pathways and mechanisms and focusing on one SIRT and its mechanism of metabolic reprogramming and immune modulation would have been a less complicated alternative approach. However, as rightly pointed out, our work demonstrated the shared and few overlapping roles of the two sirtuins, SIRT1 and SIRT3, together mediating the immune-metabolic switch upon Salmonella infection. As per the reviewer’s suggestion, we have performed additional experiments with HIF-1α inhibitor treatment in our revised manuscript to substantiate our correlative findings on SIRT1-mediated regulation of host glycolysis (Fig.7G).

      Reviewer #1 (Recommendations For The Authors):

      The authors state "SIRT1 and SIRT3 inhibition resulted in increased pathogen loads in organs and triggered enhanced bacterial dissemination, together leading to increased susceptibility of the mice to S. Typhimurium infection owing to increased ROS and IL-6 production." How can this be reconciled? To the reviewer, this is not a convincing explanation. The reviewer is not a mouse pathologist, so maybe did not understand the argument in full.

      However, in order to clarify whether these phenomena can be brought into context and explained by for instance cell-autonomous (in (RAW) macrophages) versus non-autonomous (in mice) mechanisms, it would be required to bring in context the organismic phenotype with a cellular phenotype, using more physiologic primary macrophages.

      (1) The authors show in Figure 8 that in general SRT inhibition leads to increased infection whereas SRT activation results in decreased infection. This is even true for e the spleen (e.g. Figure 8B), which should be full of macrophages upon infection.

      (2) Only Figure 8L implies that endogenous primary, splenic macrophages show a higher infection rate upon pharmacologic SRT activation, which would potentially mirror the RAW results. This is however not supportive of their own explanation: Who would now produce more ROS and IL6 if these macrophages are more supportive of intracellular ST? Is there a difference in the roles or SRTs between different types of macrophages and/or neutrophils? And between macrophages and somatic cells concerning ST infection? The reviewer tends to believe that RAW cells display a defective killing response (such as ROS production) as they are highly transformed cells. Therefore, the authors should use cultured peritoneal macrophages or BMDMs in addition to RAW264.7 cells.

      The literature cited by the authors also implies that the inflammatory response in mice is higher in the absence of SRTs. This is in line with a role for SRTs in (negatively) regulating M1 inflammatory polarization but probably not with increased bacterial burden in mice. If it was, then increased dissemination could be explained by increased tissue damage. However, the flow cytometry experiments from infected organs then do not confirm that, as the infection of individual cells is higher upon SRT inhibition. Thus there seems a broad gap between the role of SRTs in ST infection in RAW264.7 cells versus non-transformed cells.

      I would not discard the RAW results, as I am convinced that they contain valuable data. However, it needs to be clarified what aspect of the host response RAW 264.7 cells represent. Primary macrophages might likely be more aggressive towards the bacteria. Finally, the question arises: what is the role of the metabolic switch in the in vivo setting?

      The reviewer recommends repeating some key experiments by in-vitro-infecting BMDMs or isolated peritoneal macrophages (after some days of culturing) to bridge between the present RAW-derived data and the mouse data. How is the bacterial load with and without SRT inhibitor/activator in primary macrophages, when infected outside of the body? Can ex-vivo infection also affect polarization of e.g. peritoneal macrophages or the metabolic switch? If it is possible to find a conclusive explanation for their data, then this story might really add to our understanding of another aspect of how ST manipulates the host to survive.

      In case the reviewer understands the mouse experiments correctly, all assays on peritoneal cells were performed after in-vivo-infection and/or treatment.

      Together, RAW 264.7 murine macrophage-like cells might not be the right model to understand the phenotypes in full. As far as the reviewer knows, these cells are not capable of killing bacteria as effectively as activated primary macrophages or neutrophils.

      A few of the key findings of RAW264.7 macrophages have been replicated in primary peritoneal macrophages (Fig. 2B, S3E-F, S6B, S7B-D). We wanted to clarify that the peritoneal macrophage experiments were performed ex vivo, wherein peritoneal macrophages were isolated from mice were then subjected to SIRT1/3 inhibitor treatments and Salmonella infection and not after in vivo treatment or infection. In ex vivo setting, we have examined the effect of SIRTs on the metabolic switch during Salmonella infection (Fig. S7B-D) which resembled our RAW264.7 macrophage data. Additionally, in in vivo setting, we have analyzed the transcript level expression of host metabolic genes and corresponding bacterial metabolic genes in infected mice liver and spleen tissue under SIRT1/3 inhibitor treatment (Fig.S7E-F, Fig.6C-D). Our primary peritoneal macrophage data exactly mirrors the RAW264.7 macrophage findings showing attenuated intracellular bacterial proliferation owing to the heightened proinflammatory burst upon SIRT1/3 knockdown or inhibition (Fig.2A-B). This is opposite to our in vivo mouse model of infection which shows increased organ burden and bacterial dissemination (Fig.8A-H). The pro-inflammatory arsenals that limit bacterial proliferation within the macrophages (F4/80+ macrophages within the spleen or in RAW264.7 macrophages or primary peritoneal macrophages) are facilitating bacterial dissemination in blood and to the other organs owing to tissue damage (Fig.8E-L). This is in line with the following previous findings-

      Klebsiella pneumoniae infection triggers an inflammatory response via secretion of IL-6 upon HIF-1α activation that induces bacterial dissemination (Holden VI, Breen P, Houle S, Dozois CM, Bachman MA. Klebsiella pneumoniae Siderophores Induce Inflammation, Bacterial Dissemination, and HIF-1α Stabilization during Pneumonia. mBio. 2016 Sep 13;7(5):e01397-16. doi: 10.1128/mBio.01397-16. PMID: 27624128; PMCID: PMC5021805.).

      Correlation analysis of immune responses to Salmonella infection revealed that increased innate immune “cassette” opposes the adaptive immune arm leading to increased bacterial load in mice (Hotson AN, Gopinath S, Nicolau M, Khasanova A, Finck R, Monack D, et al. Coordinate actions of innate immune responses oppose those of the adaptive immune system during Salmonella infection of mice. Science Signaling. 2016;9(410):ra4). 

      As per the reviewer’s suggestions, we have analyzed other populations apart from F4/80+ macrophages and have observed that the CD45+ splenic population depicts increased bacterial loads like that of the total splenic population within the SIRT1/3 inhibited cohorts. However, CD45+ monocytes and Ly6C positive splenic population exhibit compromised burden within the SIRT1/3 inhibited cohorts. Moreover, the CD1c+ population, CD45+ granulocytes, or lymphocytes show comparable organ loads to that of the vehicle control or SIRT1 activator-treated mice group (Fig.8M-S, Fig.S8). Overall, our data suggest heterogeneous bacterial burden in diverse splenic populations.

      Reviewer #3 (Recommendations For The Authors):

      Abstract

      The authors state that perturbing Sirt1 and Sirt3 results in a shift in Salmonella's metabolism. On the contrary, the data reflects the metabolism in the host cell and not the bacteria. This statement is wrong. They only show increased expression of some of the glycolytic genes in Salmonella, which is not sufficient to make the claim that the switch to fatty acid oxidation in macrophages is due to utilisation of glucose by the bacteria.

      We value the reviewer’s response and have accordingly reframed our sentence in the abstract (Line 24-25).

      Fig 1: Expression of Sirt1 - The data needs to be supported with a western blot for Sirt1 and Sirt3 but the Western blots shown in the supplementary figure are of very poor quality and do not support the authors' claim.

      We have repeated the western blot and have supplemented the previous blot with an alternate blot in Fig. S1A as per your precious input.

      Why haven't the authors shown any representative blots for Sirt1 and Sirt3 upon infection with Salmonella mutants? They need to italicize the genes when they describe mRNA expression.

      Previously we had only performed transcript-level expression of Sirt1 and Sirt3 upon infection with Salmonella mutants and therefore representative blot image was absent. The gene names have been duly italicized while describing mRNA expression (Line 126-154). We regret the inconvenience caused. We have performed the western blotting to assess the protein expression profile upon infection with Salmonella mutants as per the reviewer’s suggestion and the representative blot image has been duly appended in the revised manuscript (Fig. S1B).

      What is the rationale for examining Sirt1 and Sirt3 mRNA in M1 and M2 macrophages? Salmonella infection on its own will polarise the macrophages towards M1. How long were these macrophages infected? The time points are missing.

      The rationale behind the examination of Sirt1 and Sirt3 mRNA in M1 and M2 polarized was to ascertain whether indeed M1 polarized macrophages exhibit decreased expression of Sirt1 or Sirt3 and polarization of macrophages toward M2 state show upregulation of Sirt1 and Sirt3 upon Salmonella infection. After confirming these above-mentioned findings through this preliminary experiment, we then hypothesized whether Salmonella infection on its own will polarise the macrophages toward an immunosuppressive M2 state at a later time course of infection as infection drives the induction of SIRT expression and whether this is mediated by Sirt1 and Sirt3 (Fig. 3). We are extremely apologetic for not mentioning the 16h time-point in the figure and the missing time point has been duly documented in the revised manuscript (Line 155).

      Fig S2 knockdown of Sirt1 and Sirt3 are not convincing.

      We are extremely sorry for the inconclusive knockdown blot. An alternative blot has been substantiated in the revised manuscript (Fig. S2,C-D).

      Fig 2A and 2B the time point post infection has not been mentioned. Although it is stated that 2h and 16h post-infection samples were analysed. Only one time point has been shown.

      We are sorry for the confusion. We wanted to clarify that Fig.2A and Fig. 2B show the fold proliferation where fold proliferation was calculated as CFU at 16hr divided by CFU at 2hr as mentioned in the materials and methods section under the heading of Intracellular proliferation or gentamicin protection assay.

      Fold Proliferation= [CFU at 16h]/[CFU at 2h]

      The cytokines data are intriguing in that the increase in IL-6 relative to control is seen only at 2h and 20h but not at 6h. Il-6 at 20h in untransfected cells is comparable to uninfected cells. Did the authors investigate cell death? Salmonella induces various forms of cell death which could account for the decreased cytokine production at later time points.

      We have investigated the cell death upon Salmonella infection via MTT assay. At later time points of infection, we indeed observed around 16 percent decrease in cell survival compared to the initial time point of 2h. The results have been appended here and it supports our eminent reviewer’s reasoning for the decreased cytokine production at later time points.

      Author response image 2.

      Additional cytokines such as IL-1b would be helpful. Also, not sure how uninfected macrophages produce nearly 200pg of IL-10.

      As per the author’s critical suggestion, we have assessed the IL-1b cytokine production at 16h post-infection in RAW264.7 macrophages and peritoneal macrophages and mice serum samples at 5th day post-infection (Fig.S3C, S3E-F). Our results indicate increased production of IL-b in the infected SIRT1/3 knockdown RAW264.7 macrophages, SIRT1/3 inhibitor-treated peritoneal macrophages and in mice serum samples under SIRT1/3 inhibitor treatment in comparison to the vehicle control. Additionally, we have quantified IL-1b in mice ileal tissues under SIRT1/3 inhibitor treatment (Fig.S3G) and have obtained heightened intestinal IL-1b production in the inhibitor-treated cohorts. We thank the reviewer for raising the concern for 200pg of IL-10 in the uninfected macrophages. We have repeated the experiment and have provided an alternative representative graph for the experiment wherein the IL-10 levels in the uninfected cohorts range between 20-40pg/ml (Fig. S3B).

      It is surprising that the authors have found increased Sirt1 binding to NFkB, however there is no change in acetylated NFkB upon infection (Fig 4B). Acetylated p65 is equally high in uninfected Scrambled siRNA, UI shSirt1, STM Scr, and STM shSirt1. Furthermore, increased binding of Sirt1 with NFkb would mean decreased acetylation hence decreased inflammation. However, Salmonella induces profound inflammation.

      We thank the reviewers for their insightful and critical questioning. We truly acknowledge that due to oversaturation there was no apparent change in the acetylated p65 among the different sample sets. Therefore, in the revised manuscript we have provided an image at lower exposure where the changes in the acetylation of the p65 subunit are apparent. Salmonella induces inflammation upon challenge similar to any other pathogens and induces acute inflammatory responses. This heightened acute inflammation at the initial phases of infection subsides at a later phase of infection. Here, we have performed the Sirt1 interaction with NFκB at 16hr post-infection where increased binding of Sirt1 with NFκB facilitates the resolution of the Salmonella-_induced acute inflammation. This is in line with previous reports that suggest SIRT1 suppresses acute inflammation through the promotion of p65 acetylation and inhibition of NFκB activity. (Yang H, Zhang W, Pan H, et al. SIRT1 activators suppress inflammatory responses through promotion of p65 deacetylation and inhibition of NF-κB activity. _PLoS One. 2012;7(9):e46364. doi:10.1371/journal.pone.0046364, Liu TF, Yoza BK, El Gazzar M, Vachharajani VT, McCall CE. NAD+-dependent SIRT1 deacetylase participates in epigenetic reprogramming during endotoxin tolerance. J Biol Chem. 2011;286(11):9856–64., Liu TF, Vachharajani V, Millet P, Bharadwaj MS, Molina AJ, McCall CE. Sequential actions of SIRT1-RELB-SIRT3 coordinate nuclear-mitochondrial communication during immunometabolic adaptation to acute inflammation and sepsis. J Biol Chem. 2015;290(1):396–408.)

      Please explain how the acetylated p65 was analysed.

      Total endogenous p65 subunit was immunoprecipitated using Anti-NFκB p65 antibody and the immunoprecipitated fraction was probed with Anti-Acetylated Lysine antibody to assess acetylated p65.

      An increase in ROS production is seen in a relatively small percentage of cells- not more than 4% of cells. How does this contribute to such a significant difference in intracellular bacterial burden? Also, it is not clear how the authors calculated the fold change in proliferation. It is better to show the actual bacterial burden logarithmically.

      We strongly agree with the reviewer’s concerns, and we have reanalyzed the flow cytometric data set. The revised data have been presented in Fig. S5 which shows a considerable increase in DCFDA positive population. For instance, the infected scrambled control shows around 2.44% of ROS-producing cells, however knockdown of SIRT1 and SIRT3 increases the ROS-producing cells to 27.34% and 28.64% respectively.

      Fold proliferation was calculated as CFU at 16hr divided by CFU at 2hr as mentioned in the materials and methods section under the heading of Intracellular proliferation or gentamicin protection assay. Fold proliferation has been calculated as opposed to absolute CFU values to nullify the differential phagocytosis of bacteria to the macrophages among the samples.

      Fold Proliferation= [CFU at 16h]/[CFU at 2h]

      An increase in metabolic genes is not sufficient to show that the macrophages are metabolically reprogrammed.

      We thank the reviewer for the valuable comment. We agree that an increase in metabolic gene profile is not sufficient to claim metabolic reprogramming. Therefore, in addition to the metabolic gene profile, we have estimated lactate production (end-product of glycolysis) as an indicator of glycolysis (Fig. 5 C-E) and have performed the fatty acid β oxidation activity (Fig. 5G-H) to support our claims.

      Figure 5F the band intensities do not visually match the bands shown for PFK. For instance, shSIRT1 STM (1.00) and shSIRT3 STM (0.81).

      We are extremely sorry for the erroneous band intensity for shSIRT3. Upon reanalysis of the band intensities, we have corrected the band intensity for shSIRT3 to 2.28 (Fig.5F).

      It is surprising that HADHA is not expressed in uninfected samples.

      We are extremely apologetic for the inappropriate representative blot. We feel that the discrepancy might have arisen due to the usage of old antibodies. We have provided an alternate blot for the HADHA gene where fresh antibody staining solution was used for probing which shows expression even in the uninfected samples (Fig.5F).

      Figure 6A - What is the significance of PFA fixed samples (PI) compared to SI samples? This has not been discussed.

      PFA-fixed samples are paraformaldehyde-treated bacterial samples that harbor the immune signals or Pattern Associated Molecular Patterns (PAMPs). The rationale for using PI in addition to SI samples was to show whether the phenomena is driven by live metabolically active pathogens or is mediated by PAMPs.

      I understand that the hypothesis is that during the later phase of infection, there is an increase in fatty acid oxidation which correlates with a decrease in inflammation. However, at 6h there is no increase in genes regulating fatty acid oxidation. Why did the authors choose 6h when the previous experiments have been done at 16h?

      We indeed agree with the reviewer’s understanding of our hypothesis that there is an increase in fatty acid oxidation along the progression of infection which correlates with a decrease in inflammation. The Salmonella intracellular replication has been reported to commence at 6h post-internalization when SPI-2 effector expression is fully established (Helaine S, Thompson JA, Watson KG, Liu M, Boyle C, Holden DW. Dynamics of intracellular bacterial replication at the single cell level. Proc Natl Acad Sci U S A. 2010;107(8):3746-3751. doi:10.1073/pnas.1000041107). Therefore, we have assessed the 6h timepoint post-infection in addition to the initial and later timepoints of 2h and 16h respectively. Additionally, the nanostring gene profiling data of both host and bacterial genes indicate the onset of both metabolic (Fig. 5A, 6A) and immune genes (Fig. 3A) modulation at 6h post-infection. We have validated these results via qPCR studies and have observed an upregulation in the transcript level of fatty acid oxidation genes as depicted in Fig. S7A in RAW264.7 macrophages.

      Line 355 it is mentioned that Sirt1 and Sirt3 abrogate metabolic shift by reducing glycolytic flux. This is incorrect as experiments such as carbon chase assays have not been performed to investigate glycolytic flux.

      As per the reviewer’s valuable suggestion, we have removed the word ‘flux’ from the above-mentioned statement(Line 351, Line 353).

      Lines 392-393: "We immunoprecipitated PDHA1 and checked for its interaction with SIRT3 or SIRT1 under knockdown condition of SIRT3 or upon SIRT3 inhibitor treatment (Fig.7 G-H)"

      What is the rationale for checking PDHA1 interaction with Sirt under Sirt knockdown conditions?

      We are thankful to the reviewer for the critical comments. The rationale for checking PDHA1 interaction with Sirt was to ascertain that indeed Sirt interacted with PDHA1 under S. Typhimurium infection and abrogation of either protein expression (knockdown) or their enzymatic activity (inhibitor treatment) diminished the interaction.

      Moreover, the blots are very confusing and do not represent the authors' claims.

      (1) In the input blot I do not see Sirt3 depletion in shSirt3 knockdown sample.

      The knockdown has been quantified in the input blot as per your suggestion. A knockdown of 40% has been obtained in the uninfected dataset whereas a knockdown of 47.1% has been obtained in the infected data set at 16h post-infection (Fig.7H).

      (2) Why does Sirt1 interact with PDHA1 similar to Sirt3. Do both the proteins bind to PDHA1 at the same time/ competitively? If so do they both deacetylate?

      In literature, Sirt3 has been shown to interact with PDHA1 and deacetylate PDHA1. However, the interaction of Sirt1 with PDHA1 has not been reported previously and therefore we are unable to comment on the exact dynamics of the interaction. Future studies need to be performed to explore these phenomena in depth. However, SIRT1 agonist SRT1720 has been shown to impact PDH phosphorylation and its activity (Han Y, Sun W, Ren D, Zhang J, He Z, Fedorova J, Sun X, Han F, Li J. SIRT1 agonism modulates cardiac NLRP3 inflammasome through pyruvate dehydrogenase during ischemia and reperfusion. Redox Biol. 2020 Jul;34:101538).

      (3) Figure 7I in the IP: IgG samples Sirt3 seem to bind to IgG non-specifically, which questions the specificity of Sirt3 binding to PDHA1.

      We appreciate the reviewer for pointing out this concern. The immunoprecipitation experiment has been repeated and the same has been appended in the revised manuscript and we observe no non-specific binding of Sirt3 antibody to IgG.

      (4) In Figure 7I all the bands Ac PDHA1, PDHA1, and Sirt3 look similar with double bands, which has not been seen in other blots. How is this possible?

      This cannot explain the increase in beta-oxidation observed.

      We thank the reviewer for raising this concern. We have repeated the experiment and provided the alternative blot as per the reviewer’s suggestion.

      The rationale for performing this experiment was to show that SIRT plays an important role in the activation of downstream TCA cycle pathways via PDHA1 deacetylation during Salmonella infection. The deacetylation of PDHA1 has been previously reported to cause transcriptional activation of the downstream TCA cycle and oxidative phosphorylation (Zhang Y, Wen P, Luo J, et al., Cell Death Dis.,2021). Additionally, PDHA1 hyperacetylation has been reported to cause lactate overproduction (An, S., Yao, Y., Hu, H. et al. PDHA1 hyperacetylation-mediated lactate overproduction promotes sepsis-induced acute kidney injury via Fis1 lactylation. Cell Death Dis 14, 457 (2023)). In our study, increased lactate production and PDHA1 hyperacetylation have been observed during SIRT3 inhibition conditions upon Salmonella infection.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors investigate the impact of fecal microbiota transfer (FMT) on intestinal recovery from enterotoxigenic E. coli infection following antibiotic treatment. Using a piglet model of intestinal infection, the authors demonstrate that FMT reduces weight loss and diarrhea and enhances the expression of tight junction proteins. Sequencing analysis of the intestinal microbiota following FMT showed significant increases in Akkermansia muciniphila and Bacteroides fragilis. Using additional mouse and organoid models, the authors examine the impact of these microbes on intestinal recovery and modulation of the Wnt signaling pathway. Overall, the data support the notion that FMT following ETEC infection is beneficial, however, additional investigation is required to fully elucidate the mechanisms involved.

      Strengths:

      Initial experiments used a piglet model of infection to test the value of FMT on recovery from E. coli. The FMT treatment was beneficial and the authors provide solid evidence that the treatment increased the diversity of the microbiota and enhanced the recovery of the intestinal epithelium. Sequencing data highlighted an increase in Akkermansia muciniphila and Bacteroides fragilis after FMT.

      The mouse data are consistent with the observations in pigs, and reveal that daily gavage with A. muciniphila or B. fragilis enhances intestinal recovery based on histological analysis, expression of tight junction proteins, and analysis of intestinal barrier function.

      The authors demonstrate the benefit of probiotic treatment following infection using a range of model systems.

      Weaknesses:

      Without sequencing the pre-infection pig microbiota or the FMT input material itself, it's challenging to firmly say that the observed bloom in Akkermansia muciniphila and Bacteroides fragilis stemmed from the FMT.

      Response: We have determined the relative abundance of each bacterium in fecal bacterial suspension, referring to Hu et al. (2018). The absolute abundances of Akkermansia muciniphila and Bacteroides fragilis in the FMT were 1.3 × 103 ± 2.6 × 103 and 4.5 × 103 ± 6.1 × 103 respectively.

      Reference:

      Hu LS, Geng SJ, Li Y, et al. Exogenous Fecal Microbiota Transplantation from Local Adult Pigs to Crossbred Newborn Piglets. Front. Microbiol. 2018, 8.

      The lack of details for the murine infection model, such as weight loss and quantification of bacterial loads over time, make it challenging for a reader to fully appreciate how treatment with Akkermansia muciniphila and Bacteroides fragilis is altering the course of infection. Bacterial loads of E. coli were only quantified at one time point, and the mice that received A. muciniphila and B. fragilis had very low levels of E. coli. Therefore, it is not clear if all mice were subjected to the same level of infection in the first place. The reduced translocation of E. coli to the organs and enhanced barrier function may just reflect the low level of infection in these mice. Further, the authors' conclusion that the effect is specific to A. muciniphila or B. fragilis would be more convincing if the experiments included an inert control bacterium, to demonstrate that gavage with any commensal microbe would not elicit a similar effect.

      The weight loss was added in Figure S2A. All mice were subjected to the same level of infection in the first place.

      Many of the conclusions in the study are drawn from the microscopy results. However, the methods describing both light microscopy and electron microscopy lack sufficient detail. For example, it is not clear how many sections and fields of view were imaged or how the SEM samples were prepared and dehydrated. The mucus layer does not appear to be well preserved, which would make it challenging to accurately measure the thickness of the mucus layer.

      For light microscopy, 3-4 fields were selected from each mouse to count about 30 crypts. The method of electron microscopy was complemented on line 263-270. We have removed data of the mucus layer.

      Gene expression data appears to vary across the different models, for example, Wnt3 expression in mice versus organoids. Additional experiments may be required to clarify the mechanisms involved. Considering that both of the bacteria tested elicited similar changes in Wnt signaling, this pathway might be broadly modulated by the microbiota.

      The reason why the Wnt3 expression pattern is different in mice and in porcine intestinal organoids may be caused by the different infection periods of ETEC in vivo and in vitro. Furthermore, in vivo, the stem cell niche of intestinal stem cells is not only regulated by intestinal epithelial cells, but also affected by mesenchymal cells in connective tissues (Luo et al., 2022). However, in vitro models, stem cell niche is only regulated by epithelial secretory factors, which may also account for the differences in in vitro and in vivo results.

      It has been reported that B. fragilis pretreatment significantly increased the relative abundance of A. muciniphila in the intestine of CDI mice, and the growth and maintenance of A. muciniphila were involved in the restoration of intestinal barrier integrity after CDI infection, indicating that there might exist a bacterial metabolic symbiosis between A. muciniphila and B. fragilis (Deng et al., 2018).

      References:

      Luo HM, Li MX, Wang F, et al. The role of intestinal stem cell within gut homeostasis: Focusing on its interplay with gut microbiota and the regulating pathways. Int. J. Biol. Sci. 2022, 18(13): 5185-5206.

      Deng HM, Yang SQ, Zhang YC, et al. Bacteroides fragilis Prevents Clostridium difficile Infection in a Mouse Model by Restoring Gut Barrier and Microbiome Regulation. Front. Microbiol. 2018, 9.

      The unconventional choice to not include references in the results section makes it challenging for the reader to put the results in context with what is known in the field. Similarly, there is a lack of discussion acknowledging that B. fragilis is a potential pathogen, associated with intestinal inflammation and cancer (Haghi et al. BMC Cancer 19, 879 (2019) ), and how this would impact its utility as a potential probiotic.

      Bacteroides fragilis is one of the symbiotic anaerobes within the mammalian gut and is also an opportunistic pathogen which often isolated from clinical specimens. Bacteroides fragilis was first isolated from the pathogenic site and considered to be pathogenic bacteria. However, with the deepening of research, it is gradually realized that in the long-term evolution process, Bacteroides fragilis colonized in the gut has established a friendly relationship with the host, which is an essential component for maintaining the health of the host, especially for obesity, diabetes and immune deficiency diseases. We have supplemented the discussion on line 598-603.

      Reviewer #2 (Public Review):

      Ma X. et al proposed that A. muciniphila was a key strain that promotes the proliferation and differentiation of intestinal stem cells by acting on the Wnt/β-catenin signaling pathway. They used various models, such as the piglet model, mouse model, and intestinal organoids to address how A. muciniphila and B. fragilis offer protection against ETEC infection. They showed that FMT with fecal samples, A. muciniphila or B. fragilis protected piglets and/or mice from ETEC infection, and this protection is manifested as reduced intestinal inflammation/bacterial colonization, increased tight junction/Muc2 proteins, as well as proper Treg/Th17 cells. Additionally, they demonstrated that A. muciniphila protected basal-out and/or apical-out intestinal organoids against ETEC infection via Wnt signaling. While a large body of work has been performed in this study, there are quite a few questions to be addressed.

      Major comments:

      - The similar protective effect of FMT with fecal samples, A. muciniphila or B. fragilis is perhaps not that surprising, considering that FMT likely restores microbiota-mediated colonization resistance against ETEC infection. While FMT with fecal samples increases SCFAs, it is unclear whether/how FMT with A. muciniphila or B. fragilis alter the microbiota composition/abundance as well as metabolites in the current models in a way that offers protection.

      We examined changes in the gut microbiota of mice treated with A. muciniphila and B. fragilis through 16s rRNA, and results showed that both A. muciniphila and B. fragilis improved the alpha and beta diversities of the microbiota, while these results were not included in this manuscript.

      - Does ETEC infection in piglets/mice cause histological damage in the intestines? These data should be shown.

      The results of scanning electron microscopy (Figure 3A) showed the intestinal damage of piglets after ETEC infection. H&E staining and transmission electron microscopy (Figure 5A and 5B) showed the intestinal damage of mice after ETEC infection.

      - Line 447, "ETEC adheres to intestinal epithelial cells". However, there is no data showing the adherence (or invasion) of ETEC to intestinal epithelial cells, irrespective of piglets/mouse/organoids.

      The scanning electron microscope (Figure 3A bottom) showed that ETEC K88 infected piglets existed obvious rod-shaped bacterial adhesion on the surface of microvilli. Figure 2C showed the colonization of ETEC K88 in the jejunum and colon of piglets. Figure S2A showed the E. coli colonization in intestines and other tissues of mice.

      - In both basal-out and apical-out intestinal organoid models, A. muciniphila protects organoids against ETEC infection. Did ETEC enter into intestinal epithelial cells at all after only one hour of infection? Is the protection through certain A. muciniphila metabolites?

      It has been reported that the duration of the co-culture for studying the host-microbiota cross-talk by apical-out organoids model is 1 hour (Poletti et al., 2021). In addition, Co et al. (2019) used apical-out organoids model to study host-pathogen interactions, with Salmonella enterica serovar Typhimurium or Listeria monocytogenes invading organoids for an hour.

      References:

      Poletti M, Arnauts K, Ferrante M, et al. Organoid-based Models to Study the Role of Host-microbiota Interactions in IBD. J. Crohns Colitis. 2021, 15(7): 1222-1235.

      Co JY, Margalef-Catala M, Li XN, et al. Controlling Epithelial Polarity: A Human Enteroid Model for Host-Pathogen Interactions. Cell Reports. 2019, 26(9): 2509-2520.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Ma et al. describes a multi-model (pig, mouse, organoid) investigation into how fecal transplants protect against E. coli infection. The authors identify A. muciniphila and B. fragilis as two important strains and characterize how these organisms impact the epithelium by modulating host signaling pathways, namely the Wnt pathway in lgr5 intestinal stem cells.

      Strengths:

      The strengths of this manuscript include the use of multiple model systems and follow-up mechanistic investigations to understand how A. muciniphila and B. fragilis interacted with the host to impact epithelial physiology.

      Weaknesses:

      The major weakness is that, as presented, the manuscript is quite difficult to follow, even for someone familiar with the field. The lack of detail in figure legends, organization of the text, and frequent use of non-intuitive abbreviated group names without a clear key (ex. EP/EF, or C E A B) make comprehension challenging. The results section is perhaps too succinct and does not provide sufficient information to understand experimental design and interpretation without reading the methods section first or skipping to the discussion (as an example: WNT-c59 treatment). Extensive revisions could be encouraged to aid in communicating the potentially exciting findings.

      The abbreviations of experimental groups are firstly defined in the Methods and Materials, and we have supplemented the experimental design in the results section on line 397-399, 439-442 and 516-520.

      The bioinformatics section of the methods requires revision and may indicate issues in the pipeline. Merging the forward and reverse reads may represent a problem for denoising. Also since these were sequenced on a NovaSeq, the error learning would have to be modified or the diversity estimates would be inappropriately multiplied. "Alpha diversity and beta diversity were calculated by normalized to the same sequence randomly." Not sure what this means, does this mean subsampled? "Blast was used for sequence alignment", does this mean the taxonomic alignment? This would need to be elaborated on and database versions should be included. The methods, including if any form of multiple testing was included, for LEFSE was also not included.

      Denoising was conducted using UNOISE3 to correct for sequencing errors. Subsequent analysis of alpha diversity and beta diversity were all performed based on the output normalized data. Multiple sequence alignment was performed using MUSCLE (v3.8.31) software to obtain the phylogenetic relationships of all OTUs sequences. We have supplemented the method of multiple testing on line 323-328.

      Reviewer #1 (Recommendations For The Authors):

      At some points, the rationale for using both porcine and murine models was unclear, and it would be helpful for the reader to elaborate on the benefits of these models and why they were used in the introduction. Similarly, it would be helpful to describe the benefits of basal-in organoids versus injecting standard organoids with bacteria.

      The main subject of this study was piglets, supplemented by a mouse model for validation. Interpretation of measurements from organoid microinjection experiments must account for multiple confounding variables such as heterogeneous exposure concentrations and durations, as well as impacts of disrupting the organoid wall. We have added the description in the introduction on line 88-90.

      Line 165 -- The number of piglets used seems high, is it correct approximately 100 pigs were used?

      Nine litters were selected for processing, while only 18 piglets were finally slaughtered.

      There is very little discussion of the preliminary experiment that the authors used to determine how much bacteria to use. I recommend either discussing the data and how the doses were chosen or omitting it. It was not clear if the authors used pasteurized or live bacteria in the experiments. It would also be interesting to include a discussion of the observation that relatively low levels of Akkermansia (10^6 CFU) appeared more beneficial than the higher doses, typically used in these types of experiments.

      We removed these results. The experiments used live bacteria.

      Microscopy methods for both light microscopy and EM would be stronger with added details including how many sections and fields of view were imaged and how the numbers of goblet cells normalized across samples. Without having a clear cross-section of a crypt, it is not clear to me how the images can be used to accurately quantify the number of cells per crypt. Additional details in the methods on how many total crypts were counted should also be included.

      For light microscopy, 3-4 fields were selected from each mouse to count about 30 crypts. We have removed the data of the mucus layer and goblet cells.

      Line 236 -- missing which gene was used.

      The Genbank Accession was added on line 232-233.

      Line 310 -- OTU nomenclature.

      We have supplemented the OTU nomenclature on line 314.

      Line 413 -- This line seems inconsistent with the data analysis described in the methods section. The authors may need to expand their description of the 16S data analysis to be clear and reproducible.

      We have redescribed the 16S data analysis on line 312-328.

      Line 413 -- it is not surprising that 16s analysis did not capture species, it will have limited resolution beyond the genus level.

      We deleted this sentence.

      Methods are missing some details on the data analysis, eg. methods/programs and statistical analysis of PCoA and NMDS, LefSe.

      The methods and statistical analysis of PCoA, NMDS and LEfSe were supplemented on line 323-328.

      Fig 4C -- The images do not clearly capture the mucus layer or how it was analyzed. The sections appear to be cut at a slight angle, with multiple partial sections of crypts. I think this might make it challenging to count goblet cells, especially if the counts are normalized over the number of crypts or villi. The mucus layer does not appear well preserved. For example, I would expect to see an intact mucus layer lining the colon in the PBS control group. Re-cutting sections with a clean cross-section through the tissue will make data analysis easier.

      We have removed data of the mucus layer.

      Fig 4D -- The images appear to be of the mouse proximal colon, whereas the mucus layer and most muc2 will be in the distal colon. If the authors have tissue sections of the distal colon, this may give a clearer image of the mucus layer and might be more consistent with the TEM images in Fig. 4B.

      We apologize for the absence of the distal colon sections.

      To fully preserve the mucus layer, in addition to fixing in Carnoy's solution, the embedding process must be run without the standard washes in 70% ethanol (see: Johansson and Hansson. Methods Mol Biol. (2012) 229; doi: 10.1007/978-1-61779-513-8_13). The mucus will wash away during standard paraffin embedding if the tissue is washed with 70% ethanol, and I wonder if that has occurred in these samples.

      The tissue wasn’t washed with 70% ethanol.

      Fig 6A and 6B -- Although the legend indicates that the data is representative of two independent experiments, it is not clear how many fields of view or cells were imaged. In the bar graphs, it is not clear how many crypts were analyzed and from how many fields of view.

      3-4 fields were selected from each mouse to count about 30 crypts.

      **For all of the bar graphs, this could be addressed by displaying all of the data points, rather than just the mean, to give the reader a sense of how many cells were counted. (as was done in Fig 7B).

      We have changed the bar graphs with data points.

      498-501 -- The text says that the gene expression patterns in the organoids are consistent with the in vivo data, but the data patterns of gene expression appear to be different. For example, patterns for Wnt3 and B-catenin expression in mice, appear to be the opposite of what was observed in the organoid?

      Lines 509-512 mean that the expression patterns of mice in organoids and in vivo is consistent. Figure 7C was incorrectly written as Figure 8C, we have changed it.

      Since Akkermansia does not grow under aerobic conditions, it should be made clear that the organoid co-culture treatment does not involve actively growing bacterial cultures.

      Reunanen et al. found that Akkermansia can tolerate oxygen, more than 90% Akkermansia can keep for 1 h under oxic, 5% CO2 conditions.

      Reference:

      Reunanen J, Kainulainen V, Huuskonen L, et al. Akkermansia muciniphila Adheres to Enterocytes and Strengthens the Integrity of the Epithelial Cell Layer. Appl. Environ. Microbiol. 2015, 81(11): 3655-3662.

      Minor points

      Line 50 -"evidence".

      We have changed to “evidence” on line 49.

      Line 64, 422 - italicize, check italics throughout.

      We have checked italics throughout the manuscript.

      Line 64 - may need to be reworded.

      We have changed to “Clostridioides difficile” on line 66.

      Line 77 - pathogen.

      We have changed to “pathogen” on line 77.

      Line 161 - the.

      We have removed “the” on line 161.

      Line 178 - mouse.

      We have changed to “mouse” on line 179.

      Line 313 -- wording is confusing.

      We have changed the description on line 319-320.

      Line 318 -- Silva version #.

      The version is Silva 132. We have added it on line 316.

      Line 334 - Manufacturer for Live/Dead cell stain?

      The Live/Dead cell stain was used BD Biosciences FVS510. We have added it on line 345.

      Line 433 -- FD4 not defined until here.

      We have refined the FD4 on line 218-219.

      Line 512 -- but did not promote.

      We have changed to “but did not promote” on line 526.

      Line 517 -- Looks like this should be "basal-in organoids" instead of basal-out?

      We have changed the "basal-out" to "apical-to" on line 531.

      Line 546 -- induced neonatal should be protected?

      They are in separate pens.

      Jumps from Fig 7B to Fig 8C in the text.

      We apologize for the wrong writing, and we have change it.

      Reviewer #2 (Recommendations for The Authors):

      The title itself is a bit misleading. Please consider changing it. The authors meant that A. muciniphila prevents pathogen invasion, but does not function in pathogen invasion.

      We have changed the title.

      Major comments:

      - Figures 4A, 4D, and 6B should include presentation of cross-section pictures.

      We provided cross-section pictures to the journal.

      - Figures 7, 8, and 9 should indicate clearly whether mouse or piglet organoids are used. For instance, in the main text, line 490, it indicates piglet organoids, but in Figure 7A legend, it indicates mouse tissue.

      We apologize for the misspelling, and have changed to “mice” on line 501-502.

      - In Figure 7A, the 3rd row, 2nd panel, crypts formed into spherical organoids; whereas in Figure 8, ETEC infection of basal-out organoids formed budding organoids. This needs to be better explained.

      Mouse intestinal organoids were cultured ex vivo from crypts isolated from mice infected with ETEC, while porcine intestinal organoids were co-cultured with ETEC in vitro.

      Minor comments:

      - In the result section, the numbering of Figures or supplementary Figures is problematic, i.e it should start with Figure 1..., Figure S1, but not directly go to Figure S2A etc.

      The Figure 1 was in Materials and Methods.

      - Line 458, please add the gating strategy used in the flow cytometry study.

      The gating strategy was added on line 351-356.

      - The effect of A. muciniphila on the proliferation of intestinal epithelium through the Wnt/β-catenin signaling pathway is well known (such as PMID: 32138776). The authors should discuss this in detail.

      We have supplemented the discussion on line 637-639.

      Reviewer #3 (Recommendations For The Authors):

      It is somewhat unusual that the results from the piglets are in the supplement as this is a major strength of the manuscript (Fig S2).

      We have put these results into Figure 2 of the manuscript.

      "Collectively, our results may provide theoretical basis that FMT is a promising mitigation method for pathogenic bacteria infection and a new strategy for precise application of FMT in clinical and livestock production"- This is somewhat of an odd statement as the introduction of the manuscript completely skips over most of what is known about FMTs in the context of C. difficile. Also if anything, does the authors' own data not point mostly at using A. muciniphila on its own? Clinical trials are well underway in humans.

      We have changed the sentences to “Collectively, our results may provide theoretical basis that A. muciniphila is a promising method to repair intestinal barrier damage and a new strategy for the precise application of A. muciniphila in livestock production.” on line 98-100.

      Line 26: I am not sure probiotic is the right word here given its strict scientific definition. Perhaps beneficial or protective would be more appropriate.

      We have changed “probiotic” to “beneficial” on line 25.

      Line 27: I believe AIMD is antibiotic-induced microbiome-depletion in most usages which may be more accurate and informative than dysregulated.

      The type, dosing, and time of antibiotic we used were applied to induce microbiota disorder.

      It would appear that there are issues in the reference formatting where a number of journal names are missing.

      We have re-edited the reference formatting.

      Line 64- I believe eLife requires the standard practice of italicizing genus and species names. Also Clostridium difficile should now be referred to as Clostridioides difficile.

      We have changed to “Clostridioides difficile” and italicized it on line 66 and 569. The italicizing genus and species names were checked throughout the manuscript.

      Figure S2C: is it not clear why the melt curve was included here, but the legend should make it more clear what is being shown. I assume this is to provide evidence of specificity?

      The melting curve was used to demonstrate that only the ETEC K88 could be amplified by the primers we used. We have added an illustration in the figure legend.

      Figure 2D: there should be a quantitative analysis done on the staining of Muc2.

      We have quantified the staining of MUC2 in Figure 3D.

      Figure 3: The legends are not sufficient. For example: it is not clear what Figure 3A actually shows as the y-axis is not labelled and it is not clear what the relationship is between this and the anosim which is a function for permanova.

      Anosim analysis was performed using the R software with anosim package function based on the rank order of Bray-Curtis distance values to test the significance of differences between groups. The y-axis is the rank of the distance between samples.

      Line 416- OTU not OUT.

      We have changed to “OTU” on line 428.

      Figure 4- the naming key needs to be included in the figure legend. C, E, A, and B are immediately obvious.

      The naming key was included in the figure legend.

      Methods: additional information on the flow cytometry gating strategy/controls should be included.

      The gating strategy was added on line 351-356.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      Recent studies have used optical or electrophysiological techniques to chronically measure receptive field properties of sensory cortical neurons over long time periods, i.e. days to weeks, to ask whether sensory receptive fields are stable properties. Akritas et al expand on prior studies by investigating whether nonlinear contextual sensitivity, a property not previously investigated in the context of so-called 'representational drift,' remains stable over days or weeks of recording. They performed chronic tetrode recordings of auditory cortical neurons over at least five recording days while also performing daily measurements of both the linear spectro-temporal receptive field (principal receptive field, PRF) and non-linear 'contextual gain field' (CGF), which captures the neuron's sensitivity to acoustic context. They found that spike waveforms could be reliably matched even when recorded weeks apart. In well-matched units, by comparing the correlation between tuning within one day's session to sessions across days, both PRFs and CGFs showed remarkable stability over time. This was the case even when recordings were performed over weeks. Meanwhile, behavioral and brain state, measured with locomotion and pupil diameter, respectively, resulted in small but significant shifts in the ability of the PRF/CGF model to predict fluctuations in the neuronal response over time.

      Strengths:

      The study addresses a fundamental question, which is whether the neural underpinnings of sensory perception, which encompasses both sensory events and their context, are stable across relevant timescales over which our experiences must be stable, despite biological turnover. Although two-photon calcium imaging is ideal for identifying neurons stably regardless of their activity levels and tuning, it lacks temporal precision and is therefore limited in its ability to capture the complexity of sensory responses. Akritas et al performed painstaking chronic extracellular recordings in the auditory cortex with the temporal resolution to investigate complex receptive field properties, such as neural sensitivities to acoustic context. Prior studies, particularly in the auditory cortex, focused on basic tuning properties or sensory responsivity, but Akritas et al expand on this work by showing that even the nonlinear, contextual elements of sensory neurons' responses can remain stable, providing a mechanism for the stability of our complex perception. This work is both novel and broadly applicable to those investigating cortical stability across sensory modalities.

      Weaknesses:

      Apart from some aspects such as single-unit versus multi-unit, the study largely treats their dataset as a monolith rather than showing how factors such as firing rate, depth, and cell type could define more or less stable subpopulations. It is likely that their methodology did not enable an even sampling over these qualities, and the authors should discuss these biases to put their findings more in context with related studies.

      We did, in fact, investigate whether firing rate and other physiological response properties of units might differentiate subpopulations with different stability. This analysis is shown in Figure 7B-D. There was no apparent relationship between stability of nonlinear contextual gain fields and physiological properties such as mean evoked firing rate, signal-to-noise ratio for evoked firing, or predictive power of the context model (a measure of model goodness-of-fit).

      The reviewer is correct, however, that we did not address possible differences between units recorded at different cortical depths or of different cell types, due to limitations of our methodology and sampling.

      Reviewer #2 (Public Review):

      Summary:

      This study explores the fundamental neuroscience question of the stability of neuronal representation. The concept of 'representational-drift' has been put forward after observations made using 2-photon imaging of neuronal activity over many days revealed that neurons contribute in a time-limited manner to population representation of stimuli or experiences. The authors contribute to the still contested concept of 'drifts' by measuring representation across days using electrophysiology and thus with sufficient temporal resolution to characterize the receptive fields of neurons in timescales relevant to the stimuli used. The data obtained from chronic recordings over days combined with nonlinear stimulus-response estimation allows the authors to conclude that both the spectrotemporal receptive fields as well as contextual gain fields dependent on combination sensitivity to complex stimuli were stable over time. This suggests that when a neuron is responsive to experimental parameters across long periods of time (days), its sensory receptive field is stable.

      Strengths:

      The strength of this study lies in the capacity to draw novel conclusions on auditory cortex representation based on the experimentally difficult combination of stable recordings of neuronal activity, behavior, and pupil over days and state-of-the-art analysis of receptive fields.

      Weaknesses:

      It would have been desirable, but too ambitious in the current setting, to be able to assess what proportion if any of the neurons drop out or in to draw a closer parallel with the 2-photon studies.

      We certainly agree that this comparison would have been desirable in principle. In practice, however, it was technically infeasible and would have been likely to produce misleading results. Our criteria for spike waveform matching across days were extremely conservative, to minimise the potential for a false positive match (which could artifactually decrease apparent stability of unit responses). Therefore, we were likely to have missed some neurons that did in fact remain active over days, due to small changes in extracellular waveform or just noise (which could artifactually decrease apparent stability of population representations). Two-photon imaging is more appropriate for analysing population stability, because cell identity is determined by spatial location. However, as we mention in the paper, electrophysiology is more appropriate for analysing receptive-field stability, because the temporal resolution is sufficient to resolve structure at the millisecond timescales relevant to auditory perception.

      Reviewer #3 (Public Review):

      Summary:

      In their study on "Nonlinear sensitivity to acoustic context is a stable feature of neuronal responses to complex sounds in auditory cortex of awake mice", Akritas et al. investigate the stability of the response properties of neurons in the auditory cortex of mice. They estimate a model with restricted non-linearities for individual neurons and compare the model properties between recordings on the same day and subsequent days. They find that both the linear and nonlinear components of the model stay rather constant over this period and conclude that on the level of the tuning properties, there is no evidence for representational drift on this time scale.

      Strengths:

      - The study has a clear analytical approach that goes beyond linear models and investigates this in a rigorous way, in particular comparing across-day variability to within-day variability.

      - The use of tetrodes is a rather reliable way in electrophysiological recordings to assess neuron identity over multiple days.

      - The comparison with pupil and motion activity was useful and insightful.

      - The presentation of the study is very logical and pretty much flawless on the writing level.

      Weaknesses:

      - The stability results across cells show a good amount of variability, which is only partially addressed.

      - In particular, no attempt is made to localize the cells in space, in order to check whether these differences could be layer or area-dependent.

      - The full context model also includes the possibility to estimate the input non-linearity, which was not done here, but could have been insightful.

      We agree with these comments and acknowledge these limitations, which arise from technological constraints. In particular, the tangential trajectory of our chronic tetrode implant, used to maximise stability of chronic recordings, limited our ability to sample cells from different cortical layers/areas and to explore how these factors might relate to variability in stability across units. Estimating input nonlinearities would have been valuable but also would have increased the number of parameters in the model and the data required to obtain reliable, predictive model fits.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors explored how galanin affects whole-brain activity in larval zebrafish using wide-field Ca2+ imaging, genetic modifications, and drugs that increase brain activity. The authors conclude that galanin has a sedative effect on the brain under normal conditions and during seizures, mainly through the galanin receptor 1a (galr1a). However, acute "stressors(?)" like pentylenetetrazole (PTZ) reduce galanin's effects, leading to increased brain activity and more seizures. The authors claim that galanin can reduce seizure severity while increasing seizure occurrence, speculated to occur through different receptor subtypes. This study confirms galanin's complex role in brain activity, supporting its potential impact on epilepsy.

      Strengths:

      The overall strength of the study lies primarily in its methodological approach using whole-brain Calcium imaging facilitated by the transparency of zebrafish larvae. Additionally, the use of transgenic zebrafish models is an advantage, as it enables genetic manipulations to investigate specific aspects of galanin signaling. This combination of advanced imaging and genetic tools allows for addressing galanin's role in regulating brain activity.

      Weaknesses:

      The weaknesses of the study also stem from the methodological approach, particularly the use of whole-brain Calcium imaging as a measure of brain activity. While epilepsy and seizures involve network interactions, they typically do not originate across the entire brain simultaneously. Seizures often begin in specific regions or even within specific populations of neurons within those regions. Therefore, a whole-brain approach, especially with Calcium imaging with inherited limitations, may not fully capture the localized nature of seizure initiation and propagation, potentially limiting the understanding of Galanin's role in epilepsy.

      Furthermore, Galanin's effects may vary across different brain areas, likely influenced by the predominant receptor types expressed in those regions. Additionally, the use of PTZ as a "stressor" is questionable since PTZ induces seizures rather than conventional stress. Referring to seizures induced by PTZ as "stress" might be a misinterpretation intended to fit the proposed model of stress regulation by receptors other than Galanin receptor 1 (GalR1).

      The description of the EAAT2 mutants is missing crucial details. EAAT2 plays a significant role in the uptake of glutamate from the synaptic cleft, thereby regulating excitatory neurotransmission and preventing excitotoxicity. Authors suggest that in EAAT2 knockout (KO) mice galanin expression is upregulated 15-fold compared to wild-type (WT) mice, which could be interpreted as galanin playing a role in the hypoactivity observed in these animals.

      Indeed, our observation of the unexpected hypoactivity in EAAT2a mutants, described in our description of this mutant (Hotz et al., 2022), prompted us to initiate this study formulating the hypothesis that the observed upregulation of galanin is a neuroprotective response to epilepsy.

      However, the study does not explore the misregulation of other genes that could be contributing to the observed phenotype. For instance, if AMPA receptors are significantly downregulated, or if there are alterations in other genes critical for brain activity, these changes could be more important than the upregulation of galanin. The lack of wider gene expression analysis leaves open the possibility that the observed hypoactivity could be due to factors other than, or in addition to, galanin upregulation.

      We have performed a transcriptome analysis that we are still evaluation. We can already state that AMPA receptor genes are not significantly altered in the mutant.

      Moreover, the observation that in double KO mice for both EAAT2 and galanin, there was little difference in seizure susceptibility compared to EAAT2 KO mice alone further supports the idea that galanin upregulation might not be the reason for the observed phenotype. This indicates that other regulatory mechanisms or gene expressions might be playing a more pivotal role in the manifestation of hypoactivity in EAAT2 mutants.

      We agree that upregulation of galanin transcripts is at best one of a suite of regulatory mechanisms that lead to hypoactivity in EAAT2 zebrafish mutants.

      These methodological shortcomings and conceptual inconsistencies undermine the perceived strengths of the study, and hinders understanding of Galanin's role in epilepsy and stress regulation.

      Reviewer #2 (Public Review):

      Summary:

      This study is an investigation of galanin and galanin receptor signaling on whole-brain activity in the context of recurrent seizure activity or under homeostatic basal conditions. The authors primarily use calcium imaging to observe whole-brain neuronal activity accompanied by galanin qPCR to determine how manipulations of galanin or the galr1a receptor affect the activity of the whole-brain under non-ictal or seizure event conditions. The authors' Eaat2a-/- model (introduced in their Glia 2022 paper, PMID 34716961) that shows recurrent seizure activity alongside suppression of neuronal activity and locomotion in the time periods lacking seizures is used in this paper in comparison to the well-known pentylenetetrazole (PTZ) pharmacological model of epilepsy in zebrafish. Given the literature cited in their Introduction, the authors reasonably hypothesize that galanin will exert a net inhibitory effect on brain activity in models of epilepsy and at homeostatic baseline, but were surprised to find that this hypothesis was only moderately supported in their Eaat2a-/- model. In contrast, under PTZ challenge, fish with galanin overexpression showed increased seizure number and reduced duration while fish with galanin KO showed reduced seizure number and increased duration. These results would have been greatly enriched by the inclusion of behavioral analyses of seizure activity and locomotion (similar to the authors' 2022 Glia paper and/or PMIDs 15730879, 24002024). In addition, the authors have not accounted for sex as a biological variable, though they did note that sex sorting zebrafish larvae precludes sex selection at the younger ages used. It would be helpful to include smaller experiments taken from pilot experiments in older, sex-balanced groups of the relevant zebrafish to increase confidence in the findings' robustness across sexes. A possible major caveat is that all of the various genetic manipulations are non-conditional as performed, meaning that developmental impacts of galanin overexpression or galanin or galr1a knockout on the observed results have not been controlled for and may have had a confounding influence on the authors' findings. Overall, this study is important and solid (yet limited), and carries clear value for understanding the multifaceted functions that neuronal galanin can have under homeostatic and disease conditions.

      Strengths:

      - The authors convincingly show that galanin is upregulated across multiple contexts that feature seizure activity or hyperexcitability in zebrafish, and appears to reduce neuronal activity overall, with key identified exceptions (PTZ model).

      - The authors use both genetic and pharmacological models to answer their question, and through this diverse approach, find serendipitous results that suggest novel underexplored functions of galanin and its receptors in basal and disease conditions. Their question is well-informed by the cited literature, though the authors should cite and consider their findings in the context of Mazarati et al., 1998 (PMID:982276). The authors' Discussion places their findings in context, allowing for multiple interpretations and suggesting some convincing explanations.

      - Sample sizes are robust and the methods used are well-characterized, with a few exceptions (as the paper is currently written).

      - Use of a glutamatergic signaling-based genetic model of epilepsy (Eaat2a-/-) is likely the most appropriate selection to test how galanin signaling can alter seizure activity, as galanin is known to reduce glutamatergic release as an inhibitory mechanism in rodent hippocampal neurons via GalR1a (alongside GIRK activation effects). Given that PTZ instead acts through GABAergic signaling pathways, it is reasonable and useful to note that their glutamate-based genetic model showed different effects than did their GABAergic-based model of seizure activity.

      Weaknesses:

      - The authors do not include behavioral assessments of seizure or locomotor activity that would be expected in this paper given their characterizations of their Eaat2a-/- model in the Glia 2022 paper that showed these behavioral data for this zebrafish model. These data would inform the reader of the behavioral phenotypes to expect under the various conditions and would likely further support the authors' findings if obtained and reported.

      We agree that a thorough behavioral assessment would have strengthened the study, but we deemed it outside of the scope of this study.

      - No assessment of sex as a biological variable is included, though it is understood that these specific studied ages of the larvae may preclude sex sorting for experimental balancing as stated by the authors.

      The study was done on larval zebrafish (5 days post fertilization). The first signs of sexual differentiation become apparent at about 17 days post fertilization (reviewed in Ye and Chen, 2020). Hence sex is no biological variable at the stage studied. 

      - The reported results may have been influenced by the loss or overexpression of galanin or loss of galr1a during developmental stages. The authors did attempt to use the hsp70l system to overexpress galanin, but noted that the heat shock induction step led to reduced brain activity on its own (Supplementary Figure 1). Their hsp70l:gal model shows galanin overexpression anyways (8x fold) regardless of heat induction, so this model is still useful as a way to overexpress galanin, but it should be noted that this galanin overexpression is not restricted to post-developmental timepoints and is present during development.

      The developmental perspective is an important point to consider. Due to the rapid development of the zebrafish it is not trivial to untangle this. In the zebrafish we first observe epileptic seizures as early as 3 days post fertilization (dpf), where the brain is clearly not well developed yet (e.g. behavioral response to light are still minimal). Even the 5 dpf stage, where most of our experiments have been conducted, cannot by far not be considered post-development.  

      Reviewer #3 (Public Review):

      Summary:

      The neuropeptide galanin is primarily expressed in the hypothalamus and has been shown to play critical roles in homeostatic functions such as arousal, sleep, stress, and brain disorders such as epilepsy. Previous work in rodents using galanin analogs and receptor-specific knockout has provided convincing evidence for the anti-convulsant effects of galanin.

      In the present study, the authors sought to determine the relationship between galanin expression and whole-brain activity. The authors took advantage of the transparent nature of larval zebrafish to perform whole-brain neural activity measurements via widefield calcium imaging. Two models of seizures were used (eaat2a-/- and pentylenetetrazol; PTZ). In the eaat2a-/- model, spontaneous seizures occur and the authors found that galanin transcript levels were significantly increased and associated with a reduced frequency of calcium events. Similarly, two hours after PTZ galanin transcript levels roughly doubled and the frequency and amplitude of calcium events were reduced. The authors also used a heat shock protein line (hsp70I:gal) where galanin transcript levels are induced by activation of heat shock protein, but this line also shows higher basal transcript levels of galanin. Again, the higher level of galanin in hsp70I:gal larval zebrafish resulted in a reduction of calcium events and a reduction in the amplitude of events. In contrast, galanin knockout (gal-/-) increased calcium activity, indicated by an increased number of calcium events, but a reduction in amplitude and duration. Knockout of the galanin receptor subtype galr1a via crispants also increased the frequency of calcium events.

      In subsequent experiments in eaat2a-/- mutants were crossed with hsp70I:gal or gal-/- to increase or decrease galanin expression, respectively. These experiments showed modest effects, with eaat2a-/- x gal-/- knockouts showing an increased normalized area under the curve and seizure amplitude.

      Lastly, the authors attempted to study the relationship between galanin and brain activity during a PTZ challenge. The hsp70I:gal larva showed an increased number of seizures and reduced seizure duration during PTZ. In contrast, gal-/- mutants showed an increased normalized area under the curve and a stark reduction in the number of detected seizures, a reduction in seizure amplitude, but an increase in seizure duration. The authors then ruled out the role of Galr1a in modulating this effect during PTZ, since the number of seizures was unaffected, whereas the amplitude and duration of seizures were increased.

      Strengths:

      (1) The gain- and loss-of function galanin manipulations provided convincing evidence that galanin influences brain activity (via calcium imaging) during interictal and/or seizure-free periods. In particular, the relationship between galanin transcript levels and brain activity in Figures 1 & 2 was convincing.

      (2) The authors use two models of epilepsy (eaat2a-/- and PTZ).

      (3) Focus on the galanin receptor subtype galr1a provided good evidence for the important role of this receptor in controlling brain activity during interictal and/or seizure-free periods.

      Weaknesses:

      (1) Although the relationship between galanin and brain activity during interictal or seizure-free periods was clear, the manuscript currently lacks mechanistic insight in the role of galanin during seizure-like activity induced by PTZ.

      We completely agree and concede that this study constitutes only a first attempt to understand the (at least for us) perplexing complexity of galanin function on the brain.

      (2) Calcium imaging is the primary data for the paper, but there are no representative time-series images or movies of GCaMP signal in the various mutants used.

      We are in the process of preparing some time series images and will include them in the next revision.

      (3) For Figure 3, the authors suggest that hsp70I:gal x eaat2a-/-mutants would further increase galanin transcript levels, which were hypothesized to further reduce brain activity. However, the authors failed to measure galanin transcript levels in this cross to show that galanin is actually increased more than the eaat2a-/- mutant or the hsp70I:gal mutant alone.

      This is an excellent suggestion. We will perform the necessary qPCR experiments and will include the data in the next revision.

      (4) Similarly, transcript levels of galanin are not provided in Figure 2 for Gal-/- mutants and galr1a KOs. Transcript levels would help validate the knockout and any potential compensatory effects of subtype-specific knockout.

      (5) The authors very heavily rely on calcium imaging of different mutant lines. Additional methods could strengthen the data, translational relevance, and interpretation (e.g., acute pharmacology using galanin agonists or antagonists, brain or cell recordings, biochemistry, etc).

      Again, we agree and concede that a number of additional approaches are needed to get more insight into the complex role of galanin in regulation overall brain activity. These include, among others, also behavioral, multiple single cell recordings and pharmacological interventions.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript addresses a fundamental question about how different types of communication signals differentially affect brain state and neurochemistry. In addition, their manuscript  highlights the various processes that modulate brain responses to communication signals, including prior experience, sex, and hormonal status. Overall, the manuscript is well-written and the research is appropriately contextualized.

      That being said, it remains important for the authors to think more about their analytical approaches. In particular, the effect of normalization and the explicit outlining and interpretations of statistical models. As mentioned in the original review, the normalization of neurochemical data seems unnecessary given the repeated-measures design of their analysis and by normalizing all data to the baseline data and including this baseline data in the repeated measures analysis,   one artificially creates a baseline period with minimal variation that dramatically differs in variance from other periods (akin to heteroscedasticity). If the authors want to analyze how a stimulus changes neurochemical concentrations, they could analyze the raw data but depict normalized data in their figures (similar to other papers). Or they could analyze group differences in the normalized data of the two stimulus periods (i.e., excluding the baseline period used for normalization).

      We appreciate the reviewer’s point on the difference in variance caused by including the 100% baseline values in the analysis. After consulting with our statistician, we chose the latter of the two approaches suggested by the reviewer. Specifically, we reran the analysis to exclude the baseline and focus only on the playback windows and the group differences. The text in the results, the significance signs in the figures, and the discussion are corrected accordingly. Despite these changes, our major conclusions remains as before.

      We also followed this reviewer’s suggestions to clarify the statistical model in studying the experience effect. After further consultation with our statistician, we reran the analysis on experience effect, including all the groups of EXP and INEXP animals together. We have corrected text in the figure captions, results, discussion, and data analysis sections of the manuscript related to the effect of experience and its interactions. This has not changed the conclusion made related to the experience effect in the dataset.

      It would also be useful for the authors to provide further discussion of the potential contributions of different types of experiences (mating vs. restraint) to the change in behavior and neurochemical responses to the vocalization playbacks and to try to disentangle sensory and  motor contributions to neurochemical changes.

      We have acknowledged in the Discussion that previous studies suggest that the effect of experience involving stress could be generalized. We believe that this is an important area of future research. Our Discussion acknowledges that the relationship between sensory and motor contributions to neurochemical changes remains an area of interest. We further point out that the time resolution of microdialysis data renders the suggested discussion highly speculative. We plan to use other methods to assess this in future experiments.

      Reviewer #3 (Public Review):

      The work by Ghasemahmad et al. has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the basolateral amygdala (BLA) while an animal listens to social vocalizations.

      Ghasemahmad et al. made changes to the manuscript that have significantly improved the work. In particular, the transparency in showing the underlying levels of Ach, DA, and 5HIAA is excellent. My previous concerns have been adequately addressed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the authors responses to my previous queries (and to the comments by other reviewers). The introduction does a better job contextualizing the data, and the additional details in the results and Methods sections help readers digest the material. I continue to think the topic  is interesting and the manuscript is potentially impactful. However, I continue to be concerned about their analytical approaches and other aspects of the revised manuscript.

      (a) Normalization

      In my original review I wrote: "The normalization of neurochemical data seems unnecessary   given the repeated-measures design of their analysis and could be problematic; by normalizing     all data to the baseline data (p. 24), one artificially creates a baseline period with minimal   variation (all are "0"; Figures 2, 3 & 5) that could inflate statistical power." I continue to feel that an analysis of normalized data that includes the baseline data is inappropriate because of the minimal variation in the normalized data for the baseline period. When the normalized data for   the baseline period is included in the analysis, there is clearly variation in the extent of variability within each of the time periods (no variability at baseline, variability during periods 1 & 2; analogous to heteroscedasticity). For example, when analyzing the RAW DATA about the change in ACh release in experienced males listening to restraint vocalizations (thank you for releasing the raw data), there was a non-significant effect of time (baseline, period 1, and period 2; linear mixed effects model; F(2,12)=3.2, p=0.0793). However, when the normalized data for  this dataset was analyzed (with baseline values being set at 100% for each mouse), there was a statistically significant effect (F(2,12)=4.5, p=0.0352). This example is just to illustrate how normalization can affect (e.g., inflate) statistical power.

      That being said, I do think that it is reasonable to analyzed normalized data if the period used for normalization is NOT included in the analysis (see Figure 3 of one of the paper the authors listed in their response to reviewers: Galvez-Marquez et al., 2022). However, from the reading of this manuscript, it does seem like normalized baseline data are analyzed to assess how stimuli affect neurochemical concentrations.

      We appreciate the reviewer’s point on the difference in variance caused by including the 100% baseline values in the analysis. After consulting with our statistician, we chose one of the two approaches suggested by the reviewer. Specifically, we reran the analysis to exclude the baseline and focus only on the playback windows and the group differences. The text in the results, the significance signs in the figures, and the discussion are corrected accordingly. Despite these changes, our major conclusions remains as before. We have included some descriptive statistics in the text because we think these are informative.

      We decided to take this approach because the inter-individual variability in the raw data levels, caused by non-experimental factors, is too great to be useful. As we have stated before, these values are affected by probe placement, collection process, or differences in the HPLC or LC/MS runs. These effects are widely recognized in the field.

      It is worth pointing out a few things about the papers listed by the authors. Li et al. (2023) does depict normalized microanalysis data but it isn't clear that any analysis of the normalized data is conducted. The same can be said about Holly et al. (2016). Further, in Bagley et al (2011), the authors depict normalized data in the figures but conduct analyses on the raw data ("After  chronic morphine treatment, systemic naloxone injection increased GABA outflow in PAG by 41% (from 24.6 {plus minus} 2.9 nM to a peak of 34.8 {plus minus} 3.8 nM, n = 6, P = 0.016), but did not alter GABA levels after vehicle treatment (39.8 {plus minus} 8.3 to 38.6 {plus  minus} 7.4 nM with naloxone at matched peak time, n = 4; Fig. 3a)". This latter approach (analyzing raw data in a repeated-measures manner and depicted normalized data) seems reasonable for the authors of the current study.

      (b) Clarification and modification of statistical models

      When analyzing the effect of experience on neuromodulator release, the authors analyze the experienced and inexperienced mice independently (e.g., figure 3 vs. 6). The ideal way to assess the effects of experience is to create a factorial model. For example, one could analyze a full factorial model with experience (exp vs. inexp), stimulus time (mating vs. restraint) and time  (baseline, period 1 vs period 2, assuming raw data are used). If one wanted to exclude the  baseline period because group differences in baseline are not informative, conducting a factorial analysis of normalized data with just the data from period 1 and 2 seems fine. I believe an analysis like this will help increase the legitimacy of the analysis. For example, when analyzing the normalized data (periods 1 and 2) of experienced and inexperienced males in response to mating or restraint vocalizations, you find a significant interaction between experience and stimulus type. Finding an effect of experience in an analysis that includes both experienced and inexperienced mice is ideal from an analytical framework.

      In Figure 6, it is not clear what the statistical model is and what the interactions mean. For example, in the figure legend for figure 6, the authors report time*context and time*sex interactions. However, in this analysis there are two groups of inexperienced males (males that   are listening to restraint vocalizations, males that are listening to mating vocalizations) and one group of females (females that are listening to mating vocalizations); in other words, this is an unbalanced analysis. So, when the authors indicate a time*context interaction, does that mean  they are comparing the male-restraint group to the combination of males and females listening to mating vocalizations? And when they talk about a time*sex interaction, are they analyzing how males listening to either mating or restraint vocalizations differ from females listening to a   mating vocalization? This all seems peculiar to me.

      - A similar set of questions could be raised about interaction effects depicted in Figure 4.

      Overall, I would like this manuscript to be reviewed by a statistician to provide additional input on how best to analyze the data.

      We followed the reviewer’s suggestions to clarify the statistical model in studying the experience effect. After further consultation with the statistician, we reran the analysis on experience effect, including all the groups of EXP and INEXP animals together.

      Design: Intercept + Sex +Context + Experience+ Sex* Experience + Context* Experience.

      The model is not full factorial as recommended by the statistician, because we don’t have females in the restraint group and that would make an unbalanced design. Therefore, running GLM based on the above model and included factors, as advised by the statistician, is the best way of approaching the analysis for the current dataset.

      We have corrected text in the figure captions, results, discussion, and data analysis sections of the manuscript related to the effect of experience and its interactions. The GLM models are clarified for all the figures in the “data analysis” section of the manuscript. We have clarified that the major effect of experience on neuromodulators was seen in the ACh data.

      (c) Analysis of post-stimulus period

      I agree with Reviewer 3 that analyzing the post-stimulus period would be useful. As mentioned     in the original review, these data could serve as an opportunity to show that the neurochemical levels returned to baseline and add further support for the model described in Figure 6. In   addition, these data could help reveal the link  between  neurochemical  release,  auditory responses, and behavior. If neurochemical changes reflect auditory responses, then these should back to baseline during the post-stimulus period. In addition, if behavioral variation (e.g.,    between mice hearing mating vs. restraint stimuli) persists following the termination of playback, then one could similarly assess whether neurochemical variation persists following playback. If   the latter is the case, then the neurochemical release could be more related to the behavior than to the playback stimulus itself.

      We did not change this analysis. Our response to Reviewer 3’s comment is shown below.

      “We decided not to include analyses of the post-stimulus period because this period is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback. We agree that the general question is of interest to the field, but we don’t think our study is best designed to answer that question.”

      This was accepted by Reviewer 3. We also note that release patterns have multiple time courses (e.g., Aitta-aho et al., 2018 for ACh), and thus may not support an assumption that levels should return to baseline shortly after playback offset.

      Minor comments:

      Page 7, line 15: I suggest changing "vocalization-dependent" to "stimulus-dependent" because the former could connote patterns of release related to the animal itself vocalizing.

      Changed to: “There were also distinct patterns of ACh and DA release into the BLA depending on the type of vocalization playback (Fig 3C,D).”

      Discussion section: The authors should point out a few caveats with their experiments in the Discussion section. First, experienced animals received both mating (social) and restraint experiences, and it is not clear to what degree each type of experience affected neural and behavioral responses (i.e., specificity of experience effects). For example, mating experience can lead to a wide range of physiological changes, including a resilience to stress (e.g., Leuner et al., PLoS One, 2010; Arnold et al., Hormones and Behavior, 2019), so it is possible that mating experiences by themselves could have induced these changes. Or it could be that experiencing restraint stress affects responses to mating stimuli. This could be added to the first paragraph in page 16. (The authors could also discuss which aspects of the sexual encounters might be most important for the behavioral and neural plasticity.)

      We have added text to raise this issue, stating that it is unknown wither the experience effects are specific and citing the above references concerning the generalized effects of certain experiences.

      Discussion section: It would also be useful for the authors to discuss the extent to which behavior might be driving the neurochemical changes. Some of the analyses suggest that the release is independent of the behavior (e.g., reflects a sensory responses) but this could be emphasized    more in the Discussion.

      We believe that we have addressed this issue sufficiently in our previous response to related issues raised by this reviewer. As we note, there are limitations in the time resolution of microdialysis data that render the suggested discussion highly speculative. We plan to use other methods to assess this in future experiments.

      Figure 2, legend: Please note that the text above the images describes the stimulus played back to these animals and their hormonal state, and not the type of experienced they underwent (i.e.,  clarify the titles)

      Changed as requested.

      I also agree with Reviewer 3 that "mating experience" is a misnomer for this manuscript. "Social experience with a female" is a more accurate descriptor. If they wanted to specifically provide mating experience, males should have only been tested with estrus (receptive females). I don't think this wording change detracts from their findings.

      We have not changed this term. As noted in our previous response to Reviewer #3, we stated: “In the mating experience, mounting or attempted mounting was required for the animal to be included in subsequent testing.” Due to this requirement, the term “mating behavior” is informative and appropriate. In our view, “Social experience with a female” does not adequately describe our inclusion criterion or the experience.

      Reviewer #3 (Recommendations For The Authors):

      The work by Ghasemahmad et al. has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the basolateral amygdala (BLA) while an animal listens to social vocalizations.

      Ghasemahmad et al. made changes to the manuscript that have significantly improved the work. In particular, the transparency in showing the underlying levels of Ach, DA, and 5HIAA is excellent. My previous concerns have been adequately addressed. I only have a few minor suggestions for the text and one figure.

      Minor suggestions:

      Page 2, Ln 9: add adult before male and female mice

      Changed as requested

      Page 4, Ln 10: add a period after Tsukano et al., 2019)

      Changed as requested

      Page 6, Ln 9: what did you mean by "their interaction"? Being more specific, but concise, would help the readers.

      We revised the wording to clarify that the neuromodulatory systems interact in the emission of positive and negative vocalizations.

      Page 6, Ln 17: You mention Stim 1 and Stim 2, but the stimuli are not defined at this point. The clear explanation is provided in the following paragraph. Maybe consider switching the order  and define the stimuli before you describe the liquid chromatography/mass spectrometry technique.

      We have revised and merged these paragraphs so that Stim 1 and Stim 2 are defined on first use. We also revised our description of the depiction and analysis of neurochemical data.

      Page 11, Ln 12: replace well-proven with well-documented

      Changed as requested

      Figure 2: There are two arrows pointing towards a single track. I assume one of the arrows is a duplicate. If so, delete one of the arrows. If not, please explain what the second arrow represents.

      Arrow removed

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors have studied the effects of platelets in OPC biology and remyelination. For this, they used mutant mice with lower levels of platelets as a demyelinating/remyelinating scenario, as well as in a model with large numbers of circulating platelets.

      Strengths:

      -The work is very focused, with defined objectives.

      -The work is properly done.

      Weaknesses:

      -There is no clear effect on a single cell type and/or mechanism involved.

      We appreciate the reviewer’s feedback. We understand that from our in vivo studies we are unable to distinguish whether the effects of platelets are directly exerted on OPCs or indirectly through a different cell type. However, data obtained from the platelet depleted model as well as the new data provided in this revised version in CALRHet mice indicate that, at least, macrophages / microglia do not contribute to the observed effects in OPCs. In addition to this, in vitro data support the direct effects of platelets on OPC function.

      Reviewer #2 (Public Review):

      Summary:

      This paper examined whether circulating platelets regulate oligodendrocyte progenitor cell (OPC) differentiation for the link with multiple sclerosis (MS). They identified that the interaction with platelets enhances OPC differentiation although persistent contact inhibits the process in the longterm. The mouse model with increased platelet levels in the blood reduced mature oligodendrocytes, while how platelets might regulate OPC differentiation is not clear yet.

      Strengths:

      The use of both partial platelet depletion and thrombocytosis mouse models gives in vivo evidence. The presentation of platelet accumulation in a time-course manner is rigorous. The in vitro co-culture model tested the role of platelets in OPC differentiation, which was supportive of in vivo observations.

      Weaknesses:

      How platelets regulate OPC differentiation is not clear. What the significance of platelets is in MS progression is not clear.

      We thank reviewer’s view and assessment of our manuscript. We understand both of the reviewer’s concerns. Firstly, we performed additional in vitro studies and we have confirmed that platelet-contained factors are, at least in part, responsible for modulating OPC differentiation and, thus, direct cell-cell contact is not essential. Secondly, in this revised version, we added references arguing that the plasma levels of platelet microparticles and platelet-specific factors correlate with MS progression and severity.  

      Reviewer #1 (Recommendations For the Authors):

      To ameliorate the quality of their work and make it suitable for its publication in eLife, I strongly suggest the authors to: 

      (1) In vitro co-culture platelets and OPCs to check the effects on this latter cell type biology. 

      Response: We have performed in vitro studies, in which OPCs were co-cultured with washed platelets (WP). We observed that OPC differentiation was boosted after a short exposure to WP, however, prolonged exposure to WP suppressed this effect (revised Figure 3A and B). Also, our new data using platelet lysate (PL) indicate that platelet-contained molecules are responsible for this effect (revised Figure 3C and D). Finally, we showed that by removing PL after sustained exposure (6 DIV) the ability of platelets to promote OPC differentiation is rescued (revised Figure 3E and F).

      (2) In the CALR model, can the authors check effect of chronic exposure to large numbers of platelets? Is this affecting macrophages (including their polarization)? 

      Response: Yes, compared to wild type mouse, in the CALRHET model we confirmed the presence of larger number of platelets within demyelinated lesions (Figure 4A and C). Also, in this revised version we added data showing in the CALRHET model that thrombocytosis does not affect macrophage / microglia numbers and polarization (revised Supplementary Figure 2). 

      (3) Some aspects of the Introduction section seems too old-fashioned (ex.: the use of bFGF instead of FGF2 to refer to Fibroblast Growth Factor 2), as well as it would be convenient to include more recent references on the role of FGF2 and PDGFa in OPC biology. 

      Response: We agree with the reviewer. In this revised version we have changed bFGF for FGF2 and we added more recent references addressing the role of FGF2 and PDGFa in OPC biology.

      (4) There are some constructions and typos that could be corrected. 

      Response: We have checked language constructions as well as typos, and these have been corrected.

      (5) Please revise spelling of some names in the bibliography list (ex.: the correct surname is ffrenchConstant, not Ffrench-Constant).

      We have revised the spelling of names within the bibliography, and we have corrected them accordingly.

      Reviewer #2 (Recommendations For the Authors):

      Mechanisms of platelet-OPC interactions 

      -  transwell co-culture assay will examine if the OPC phenotype is through direct or indirect interactions with platelets; 

      We have performed additional in vitro studies, in which OPCs were exposed to platelet lysate (PL). New results indicate that a short exposure to PL can promote OPC differentiation (revised Figure 3C and D), while a sustained exposure supresses this effect (revised Figure 3E and F). These findings indicate that platelet-contained factors are, at least in part, responsible for modulating OPC differentiation and, thus, direct cell-cell contact is not essential for such an effect.

      -  can you revert the phenotype of OPCs co-cultured long with platelets (maturation blocked) by removing platelet (then OPC differentiate again?) to see if the phenotype is reversible or not? 

      We would like to thank the reviewer for bringing up this interesting question. We have performed additional in vitro studies to address this issue. We found that by removing PL upon 6-days of sustained exposure rescues the ability of platelets to promote OPC differentiation (revised Figure 3E and F). These findings indicate that the supressing effect of prolonged exposure to platelets in OPC differentiation is reversible.  

      Clinical correlation 

      -  How many MS patients has an abnormal number of or exposure to platelets? 

      We have added new information in the introduction section. Indeed, previous studies have shown that MS patients display higher levels of circulating platelet microparticles (PMPs) (MarcosRamiro et al., 2014) as well as increased plasma levels of platelet-specific factors such as, P-selectin and PF4 (Cananzi et al., 1987; Kuenz et al., 2005).

      do platelets amount correlate with diseases severeness? 

      We have added new information in the introduction section. Indeed, PMPs are indicative of the clinical status of the disease (Saenz-Cuesta et al., 2014). Also, plasma levels of P-selectin and PF4 correlate with disease course and severity, respectively (Cananzi et al., 1987; Kuenz et al., 2005).

      Image quantification 

      -  please state how many sections were counted. How many animals were used per condition. Is the practice of blinded observers done for each dataset?

      We added this information in the figure legends and in methods section. We counted between 3-5 sections per lesion. We used 3 to 6 animals per experimental group and data was analysed by blinded observers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The primary weakness of the paper concerns its conclusion of having generated "homogenous mature microglia", partly based on the RNAseq analysis. However, the comparison of gene profiles was carried out only between "hiPSC-derived mature microglia" and the proliferating myeloid progenitors. While the transcriptome profiles revealed a trend of enrichment of microglia-like gene expression in "hiPSC-derived mature microglia" compared to proliferating myeloid progenitors, this is not sufficient to claim they are "mature microglia". It is important that one carries out a comparative analysis of the RNAseq data with those of primary human microglia, which may be done by leveraging the public database. To convincingly claim these cells are mature microglia, questions need to be addressed including how similar the molecular signatures of these cells are compared with the fully differentiated primary microglia cell or if they remain progenitor-like or take on mosaic properties, and how they distinguish from macrophages.

      We greatly appreciate the insightful comments and suggestions from the reviewers, which were instrumental in enhancing our data analysis and organization. In response to the feedback, we have updated the terminology from “mature microglia” to simply “microglia” while clarifying in our text that these are fully differentiated microglia under single-type cell culture conditions.

      Guided by the reviewer's advice, we incorporated RNA-seq data from human brain microglia studies conducted by Dr. Poon and Dr. Blurton-Jones' Lab (Abud et al., Neuron, 2017) and Dr. Huitinga's Lab (van der Poel et al., Nat Commun, 2019). We then conducted a comparative analysis of the gene expression profiles between our fully differentiated hiPSC-derived microglia and those from fetal/adult brain microglia (see Fig.2. Suppl. B, C and D; Suppl. table 1 and table 2). The correlation analysis revealed that our hiPSC-derived microglia closely resemble fetal and adult brain microglia, distinguishing them significantly from monocytes and inflammatory monocytes.

      (2) While the authors attempted to demonstrate the functional property of "hiPSC-derived mature microglia" in culture, they used LPS challenge, which is an inappropriate assay. This is because human microglia respond poorly to LPS alone but need to be activated by a combination of LPS with other factors, such as IFNγ. Their data that "hiPSC-derived mature microglia" showed robust responses to LPS indeed implicates that these cells do not behave like mature human microglia.

      We appreciate the feedback received. In response, we cultured hiPSC-derived microglia cells and subjected them to treatments with IFNγ, LPS, and a combination of both IFNγ+LPS, as illustrated in Figure 3 suppl. Our findings revealed that the IFNγ+LPS combination notably enhanced the expression of IL1a, IL1b, TNFa, CCL8, and CXCL10, whereas IL6 and CCL2 levels remained unchanged. Treatment with IFNγ alone significantly elevated the expression of TNFa, CCL8, CXCL10, and CCL2. These outcomes align with the findings reported by Rustenhoven et al. (Sci Rep, 2016), suggesting that the functionality of our hiPSC-derived microglia cells closely mirrors that of primary human adult microglia cells.

      (3) The resolution of Figs. 4 - 6 is so low that even some of the text and labels are hardly readable. Based on the morphology shown in Fig. 4 and the statement in line 147, these hiPSC-derived "cells altered their morphology to a rounded shape within an hour of incubation and rapidly internalized the fluorescent-labeled particles". This is a peculiar response. Usually, microglia do not respond to fluorescent-labeled zymosan by turning into a rounded shaped within an hour when they internalize them. Such a behavior usually implicates weak phagocytotic capacity.

      Thank you for your insightful comments. During submission, the main text's PDF version was converted online, resulting in low-quality output. We have since updated this with a high-resolution version. The observed alterations in cell morphology following zymosan phagocytosis may be attributed to the high zymosan concentration used (2mg/ml). We conducted an assessment to understand the impact of zymosan concentration on the morphology of hiPSC-derived microglial cells, as shown in Figure 4 suppl B. Our findings indicate that microglia cells adopt an amoeboid, rounded shape at zymosan concentrations exceeding 20ug/ml. To clarify this point, we have amended the text to read: "The cells altered their morphology and rapidly internalized the fluorescent-labeled particles."

      (4) Data presented in Fig. 5 are not very convincing to support that transplanted cells were immunopositive for "human CD11b (Fig.5C), as well as microglia signature markers P2ry12 and TMEM119 (Fig.5D)" (line 167). The resolution and magnification of Fig. 5D is too low to tell the colocalization of tdT and human microglial marker immunolabeling. In the flat-mount images (C, I), hCD11b immunolabeling is not visible in the GCL or barely visible in the IPL. This should be discussed.

      We are grateful for the reviewer's comments. As previously mentioned, the low quality of the images was due to the online conversion of the PDF version. We have now submitted both high-quality PDF and Word versions for the reviewer's assessment. In these high-quality versions, the colocalization of tdT with human P2ry12 and TMEM119 is distinctly visible. Additionally, we have updated the hTMEM119 staining images in Figure 5D. The results from hCD11b staining align with those observed in mouse CD11b staining, notably showing more effective staining in the outer plexiform layer (OPL) microglia cells. The reason for this—whether it pertains to a staining issue, a variance in CD11b expression among microglia cells in the OPL and ganglion layer (GL), or differences in the samples due to varying conditions—is not yet clear and warrants further investigation.

      (5) Microglia respond to injury by becoming active and lose their expression of the resting state microglial marker, such as P2ry12, which is used in Fig. 6 for detection of migrated microglia. To confirm that these cells indeed respond to injury like native microglia, one should check for activated microglial markers and induction of pro-inflammatory cytokines in the sodium iodate-injury model.

      The reviewer's insights are spot-on. We utilized preserved retinas to extract mRNA, which was then reverse-transcribed to cDNA for conducting qRT-PCR using human-specific primers, as detailed in the updated Table 5. The findings revealed that following retinal pigment epithelium (RPE) injury for 3 days, the transplanted hiPSC-derived microglial cells exhibited an increase in the production of inflammatory cytokines and upregulated genes related to phagocytosis, migration, and adhesion. Conversely, there was a decrease in the expression of microglia-specific signature genes and neurotrophic factors, as demonstrated in Figure 7 suppl.

      Reviewer #1 (Recommendations For The Authors):

      Line 52: "Microglia cell repopulation research suggests that: 1) if no injury or infection occurs, retinal microglia cells can sustain their homeostasis indefinitely" - this statement is too strong or delivers a confusing message; it needs clarification or to be backed up by evidence. Recent single cell RNA sequencing analyses suggest that even under a normal condition, residential microglia do not present as a single homeostatic cell cluster, rather a subpopulation of activated inflammatory microglia are constantly detectable in the normal retina. This is likely because normal retinal neurons can be stressed due to various reasons, such as the temporal accumulation of misfolded proteins, exposed to strong light, or ageing, etc.

      We appreciate the comments. We changed the sentence to read, "Microglia cell repopulation research suggests that: 1) retinal resident microglia cells can sustain their population with the local dividing and migration if any perturbations do not exceed the threshold of the recovery speed by local neighbor microglia cells."

      Line 83: "we applied an appropriate protocol for culturing human iPSC-derived microglia cells" - it would be more appropriate if the word "appropriate" can be replaced by either "unique" or a phrase like "we adopted a (previously published) protocol...".

      Thanks! We changed it to “We modified a previously published protocol to culture human iPSC-derived microglia cells.".

      Fig. 1F,G: A method of flow cytometry will provide more comprehensive cell quantification for percentages of positively labeled cells than cell counts under high magnification confocal images.

      Thanks for the comments! We agreed with the reviewer. Given the experimental resources available, the quantifications of confocal images did provide a reasonable assessment. We will perform flow cytometry analysis in future experiments.

      Reviewer #2 (Public review):

      Weaknesses:

      Gene expression analysis of mature microglia cells should be better interpreted and it would be beneficial to compare the iPSC-derived microglia gene set to a human microglial cell line (for example, HMC3) instead of myeloid progenitor cells.<br /> The way that the manuscript has been written, unfortunately, is not optimal. I recommend that the entire manuscript be edited and proofread in English. The text contains spelling and grammar mistakes, and the manuscript is inconsistent in several parts. The manuscript should also be revised for a scientific paper format.

      We appreciate the reviewer's comments and have taken them into consideration along with similar inquiries from Reviewer 1. Following the suggestions, we conducted a comparison of gene expression profiles between our hiPSC-derived microglia and those from fetal/adult brain microglia, as depicted in the updated Fig.2. Suppl. B, C and D; as well as in the Suppl. table 1 and table 2. The correlation analysis demonstrated that the hiPSC-derived microglia cells closely resemble fetal and adult brain microglia, significantly differing from monocytes and inflammatory monocytes. Additionally, we have revised the manuscript to adhere more closely to the conventional scientific format.

      Reviewer #2 (Recommendations For The Authors):

      Specific suggestions for improvement:

      - Regarding the characterization of human iPSC-derived microglia, P2RY12 is a general hematopoietic cell marker. One cannot judge the maturity of microglia only by P2RY12 expression (for example, line 261). The expression of more specific markers such as TMEM119 and PROS1 should be studied and discussed.

      We are thankful for the reviewer's valuable feedback. In response:

      We have removed the term "mature" and clarified that the hiPSC-derived microglia we studied are fully differentiated within single-type cell culture conditions.

      We performed a comparative analysis of the gene expression profiles between our hiPSC-derived microglia and microglia from human brains, as illustrated in the updated Fig.2. Suppl. B, C and D. The results affirm that hiPSC-derived microglia closely resemble human fetal and adult microglia.

      We noted that the expression of TMEM119 in hiPSC-derived microglia under in vitro single-type cell culture conditions is notably low, as shown in the below A. This suggests that the stimulatory factors in our single-type cell culture might not sufficiently induce TMEM119 expression in microglia. The necessity for a retinal environment or interaction with neuronal and/or other glial cells for TMEM119 expression mirrors the behavior of infiltrating peripheral monocytes in pathological conditions, which initially lack TMEM119 but later differentiate into microglial-like macrophages that express TMEM119, as reported by Ma et al. in Sci Rep (2017).

      Additionally, our findings suggest that PROS1 is not uniquely characteristic of microglia but is expressed across a variety of cell types. Within our specific culture conditions, we noted a higher expression of PROS1 in microglial progenitor cells, as shown in Author response image 1B and C.

      Author response image 1.

      - In Figure 2, Part E, the names of the genes or pathways in the figure are not clear, and are these genes the set that are the most differentially expressed between iPSCs-derived microglia and MPC? The analysis needs more explanation.

      We regret any confusion caused by our previous explanation. To clarify, we compiled a list of microglia-enriched genes from the research conducted by Barres BA Lab (Bennett et al., Proc Natl Acad Sci U S A, 2016) and from our own RNA sequencing data of mouse retinal microglia, identifying a total of 130 genes predominantly expressed in microglia (Suppl. Table 3). We then applied this gene list to analyze our hiPSC-derived microglia RNA sequencing data, resulting in the identification of 71 microglia-specific genes. These 71 genes were subjected to Ingenuity Pathway Analysis (IPA) to visualize the signaling pathways involved. The details of these microglia genes can be found in the updated suppl. table 3.

      - Lines 124 to 128 mention that high expression of Stat3, IL1b, and IL6 and their central role in pathway analysis emphasize the efficiency of the maturation protocol. Regarding the fact that Stat3, IL1b, and IL6 are contributors to proinflammatory pathways, it is not convincing that the high expression of these genes in iPSC-derived microglia demonstrates the efficiency of the maturation protocol, given that microglia are not stimulated.

      Thanks for the comments! We added the sentences about the comparison results between hiPSC-derived microglia and human brain microglia. We have also replaced the “mature” with “functional.” The sentence reads, “Thus, our method of obtaining differentiated microglia is a reliable method to generate a large number of homogenous functional microglia cells.”

      - Statistical analysis is missing for some graphs, for example, figures 1-3 and 5.

      We appreciate the comments. We have added the statistical results in the revised version.

      - The legend for Figure 3 needs to be rewritten. The graphs or applied assays should be explained in the legend, not the interpretation of the data.

      The legend was rewritten.

      - There is no Figure 3 in the supplement figures file.

      We added Figure 3. Suppl.

      - hTMEM119 staining in Figure 5, Part D, is mostly background. Please provide another image.

      The images were unclear after on-line converting due to the low number of pixels. We replaced them with new hTMEM119 staining images in Figure 5D.

      - In line 176, figure 5I has been forgotten to be mentioned.

      Thank you very much! We added 5I.

      - Lines 241 to 244 state that more than 50% of the AMD-associated genes are highly expressed in retinal microglia according to Fig. discussion suppl A & B. It is not clear that the gene set that was used for analysis is from a healthy retinal microglia or AMD-related ones. Please explain precisely.

      Thank you for your feedback. The gene list we referenced originates from a Genome-Wide Association Study (GWAS) that compared patients with Age-related Macular Degeneration (AMD) to healthy cohorts. We did not directly utilize this list in our experiments but referred to it to underscore the importance of microglia cells in the context of AMD.

      Some of the English proofreading and manuscript format comments:

      Line 805: Iba1 is written in lowercase. Is it human IBA1? It is not consistent with the way it is written in the text (in line 117, for example).

      Thank you for pointing out the error. We reformed all Iba1 as “Iba1”. The Iba1 we used here are all from Wako (#019–19741), which labels both mouse and human microglial cells.

      Line 814: microglia-enriched gene expression instead of microglia-enrich gene expression

      Thank you! We changed it.

      Line 345: Starting a sentence with lower case letter.

      Thank you! We changed it.

      Line 342: Myeloid lineage instead of myeloid cell linage.

      Thank you! We changed it.

      Line 815: What does FPKM stand for? The abbreviations should be explained.

      The FPKM is the abbreviation of Fragments Per Kilobase of transcript per Million mapped reads. We added it in the text.

      Line 309: The manuscript has occasionally referred to PLX-5622 without a minus. Please follow a uniform format.

      We changed all “PLX5622” to “PLX-5622”.

      Lines 327-331: should be rewritten.

      The mentioned paragraph was rewritten.

      Lines 335-340: should be rewritten.

      The mentioned sentence was rewritten.

      Line 135: qRT-PCR instead of QPCR," as it is also mentioned in the methods and material. The correction also applies to all the QPCRs in the text.

      We changed “QPCR” with “qRT-PCR”

      Figure 3: Graph B should be right side of graph A

      Images description: It is better to have the images description in the left side of the image, for example, figure 5 part B, GL, IPL and OPL

      Thanks for the suggestion. We changed the image organization as per the reviewer’s advice.

      Lines 258 to 260 in the discussion have also been repeated with the same words in the introduction.

      The mentioned paragraph was rewritten.

      Lines 327-331 should be rewritten.

      The mentioned paragraph was rewritten.

      Lines 335-340 should be rewritten.

      The mentioned paragraph was rewritten.

    1. Author response:

      Reviewer #1 (Public Review): 

      Summary: 

      In this paper, Behruznia and colleagues use long-read sequencing data for 335 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a more general pangenome graph approach to investigate structural variants also in non-coding regions. The two main results of the study are that (1) the MTBC has a small pangenome with few accessory genes, and that (2) pangenome evolution is driven by deletions in sublineage-specific regions of difference. Combining the gene-based approach with a pangenome graph is innovative, and the former analysis is largely sound apart from a lack of information about the data set used. The graph part, however, requires more work and currently fails to support the second main result. Problems include the omission of important information and the confusing analysis of structural variants in terms of "regions of difference", which unnecessarily introduces reference bias. Overall, I very much like the direction taken in this article, but think that it needs more work: on the one hand by simply telling the reader what exactly was done, on the other by taking advantage of the information contained in the pangenome graph. 

      Thank you for your constructive feedback. We have hopefully positively addressed all your concerns. Please see our detailed responses below.

      Strengths: 

      The authors put together a large data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, covering a large geographic area. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes in pangenome analysis. 

      Thank you for your positive feedback. We are pleased that you found these aspects of our work noteworthy and valuable.

      Weaknesses: 

      The study does not quite live up to the expectations raised in the introduction. Firstly, while the importance of using a curated data set is emphasized, little information is given about the data set apart from the geographic origin of the samples (Figure 1). A BUSCO analysis is conducted to filter for assembly quality, but no results are reported. It is also not clear whether the authors assembled genomes themselves in the cases where, according to Supplementary Table 1, only the reads were published but not the assemblies. In the end, we simply have to trust that single-contig assemblies based on long-reads are reliable. 

      The BUSCO results are present for all the genomes in Supplementary Table S1. Genome assemblies were obtained from public databases and other studies that performed the assemblies. We did not perform assemblies for any of the public datasets except the 11 genomes sequenced by ourselves, for which we included the assembly statistics. The public genomes from NCBI were marked as closed based on the NCBI pipelines so there are additional checks on quality undertaken there before we included in our analysis. Marin et al (2024; BioRxiv) also performed additional checks on the vast majority of the genomes before they were included here.  We are confident that these genomes represent the highest quality M. tuberculosis dataset possible, but we will check that all genomes are present in the GTDB list, which performs additional tests including CheckM, to add another layer of confidence. Some of the accessions to the final genomes were not included as these papers were not released yet but will be in the next version. Supplementary Table S1 will be updated to include the assembly information for each genome.

      One issue with long read assemblies could be that high rates of sequencing errors result in artificial indels when coverage is low, which in turn could affect gene annotation and pangenome inference (e.g. Watson & Warr 2019, https://doi.org/10.1038/s41587-018-0004-z). Some of the older long-read data used by the authors could well be problematic (PacBio RSII), but also their own Nanopore assemblies, six of which have a mean coverage below 50 (Wick et al. 2023 recommend 200x for ONT, https://doi.org/ 10.1371/journal.pcbi.1010905). Could the results be affected by such assembly errors? Are there lineages, for example, for which there is an increased proportion of RSII data? Given the large heterogeneity in data quality on the NCBI, I think more information about the reads and the assemblies should be provided. 

      We have shown elsewhere (Marin et al (2024; BioRxiv)) that short read sequencing is significantly worse for these types of problems. For this reason, we have included only closed genomes which we believe will reduce the potential for such errors. However, we agree that older sequencing technologies, such as PacBio RSII, can introduce errors in the assemblies and subsequent downstream analyses. We will look for correlation between platform and accessory genome presence/absence to see if the type of sequencing influences the results.

      Wick et al. (2023) recommend a coverage of 200x for ONT sequencing; however, newer analyses from Wick have shown that with modern basecalling and sequencing very low error rates can be achieved with much lower coverage (see https://rrwick.github.io/2023/10/24/ont-only-accuracy-update.html). We are quite confident that gene presence/absence patterns should be robust to this in our analysis but will confirm with some additional analysis on our sequenced genomes.

      The part of the paper I struggled most with is the pangenome graph analysis and the interpretation of structural variants in terms of "regions of difference". To start with, the method section states that "multiple whole genomes were aligned into a graph using PanGraph" (l.159/160), without stating which genomes were for what reason. From Figure 5 I understand that you included all genomes, and that Figure 6 summarizes the information at the sublineage level. This should be stated clearly, at present the reader has to figure out what was done.

      All genomes were included in the pangenome graph construction and to look for regions of differences. We then grouped genomes into sub-lineages to undertake the additional analyses as there is not enough genomes per sub-sub-lineages and lower for robust analyses. We will make this clearer in the next version, likely with a flowchart of analyses.

      It was also not clear to me why the authors focus on the sublineage level: a minority of accessory genes (107 of 506) are "specific to certain lineages or sublineages" (l. 240), so why conclude that the pangenome is "driven by sublineage-specific regions of difference", as the title states? What does "driven by" mean? Instead of cutting the phylogeny arbitrarily at the sublineage level, polymorphisms could be described more generally by their frequencies. 

      We acknowledge the importance of polymorphisms, but our study primarily aimed to investigate the presence and absence of genes/genomic regions, as highlighted in our focus on structural differences rather than SNPs (L67-69). We attempted to clarify our goal of exploring gene content variation both between and within lineages (L69) to avoid confusion.

      Our focus on the sub-lineage level addresses the gap in understanding gene content distribution beyond the broad lineage level, where previous pangenome studies have concentrated. The decision to focus on sub-lineages allows for a more detailed exploration of genetic diversity. Due to the limited number of genomes available to represent all sub-sub-lineages and lower levels of classification, we aimed to investigate gene content differences at the sub-lineage level. This decision allows for a more detailed and comprehensive exploration of gene content differences within the MTBC.

      I fully agree that pangenome graphs are the way to go and that the non-coding part of the genome deserves as much attention as the coding part, as stated in the introduction. Here, however, the analysis of the pangenome graph consists of extracting variants from the graph and blasting them against the reference genome H37Rv in order to identify genes and "regions of difference" (RDs) that are variable. It is not clear what the authors do with structural variants that yield no blast hit against H37Rv. Are they ignored? Are they included as new "regions of difference"? How many of them are there? etc. The key advantage of pangenome graphs is that they allow a reference-free, full representation of genetic variation in a sample. Here reference bias is reintroduced in the first analysis step. 

      Genomic analysis of Mycobacterium tuberculosis is H37Rv reference-centric, meaning that RDs are typically defined based on their presence or absence relative to the reference strain. Our approach comparing variants to the H37Rv reference was primarily to identify and name the known regions of differences (RDs). For structural variants that did not yield a BLAST hit against H37Rv, we assigned them as new RDs in Supplementary Table S4 to provide a reference-free approach for investigating gene content differences. Further clarifications on the definition and identification of RDs will be added.

      Along similar lines, I find the interpretation of structural variants in terms of "regions of difference" confusing, and probably many people outside the TB field will do so. For one thing, it is not clear where these RDs and their names come from. Did the authors use an annotation of RDs in the reference genome H37Rv from previously published work (e.g. Bespiatykh et al. 2021)? This is important basic information, its lack makes it difficult to judge the validity of the results. The Bespiatykh et al. study uses a large short-read data (721 strains) set to characterize diversity in RDs and specifically focuses on the sublineage-specific variants. While the authors cite the paper, it would be relevant to compare the results of the two studies in more detail. 

      Indeed the term regions of difference (RDs) is somewhat M. tuberculosis specific. These are large polymorphisms which are differentially present in clades (primarily lineages) of M. tuberculosis. Annotations and naming of these is based on Bespiatykh et al. (2021) and RDscan tool which identify RD regions based on the H37Rv genomic coordinates. We obtained the corresponding Rv locus for RD regions by matching their genomic coordinates on the H37Rv genome and confirmed the RDs using the bed file from RDscan. We have used their names where our findings overlap and any new RDs we report are not found in their data. We will ensure this is clearer in the next version.

      As far as I understand, "regions of difference" have been used in the tuberculosis field to describe structural variants relative to the reference genome H37Rv. Colloquially, regions present in H37Rv but absent in another strain have been called "deletions". Whether these polymorphisms have indeed originated through deletion or through insertion in H37Rv or its ancestors requires a comparison with additional strains. While the pangenome graph does contain this information, the authors do not attempt to categorize structural variants into insertions and deletions but simply seem to assume that "regions of difference" are deletions. This, as well as the neglect of paralogs in the "classical" pangenome analysis, puts a question mark behind their conclusion that deletion drives pangenome evolution in the MTBC. 

      The term regions of difference or RDs has traditionally been used to describe structural variants relative to the H37Rv genome, often interpreted as deletions. Consistent with our study, Bespiatykh et al. (2021) observed two types of deletions: those associated with repeat sequences or mobile genetic elements, and conserved RDs that are phylogenetically informative deletions inherited by all descendants of a strain.

      In our study, we employed a phylogenetic approach to identify deletions. If RDs are present in genomes both upstream and downstream of a phylogenetic branch but are absent in one specific branch, we interpret this as evidence of gene deletion (Figure 5B). This method was systematically applied to all RDs identified as deletions in our study; we will clarify this better in the next version.

      We acknowledge the importance of considering paralogs in pangenome analysis. While the evolution of genomes is driven by duplication, loss and transfer, we know that transfer is not a mechanism in modern MTBC evolution and we have focussed here on loss. Duplication (paralog) analysis from annotations continues to be difficult to quantify due to the difficult of reliably confirming paralogy. We have addressed the effect of different Panaroo options, including merge paralogs, on the genomic diversity and pangenome estimation of MTBC in our associated paper (Marin et al 2024). This study showed that most structural variation in Mycobacterium tuberculosis is attributed to rearrangements of existing sequences rather than novel sequence content. For example, the transposable element IS6110 accounts for a significant portion of sequence variation. This hints that paralogs are not very important in terms of gene content differences in MTBC.

      However, we will attempt to build on this by looking at Panaroo outputs without merged paralogs and looking for potentially duplicated genomic stretches in the Pangraph analyses. This will hopefully show more robustly that the MTBC diversity is primarily deletion driven.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports. 

      Strengths: 

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that were previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated the limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed. 

      Thank you for recognising the strengths of our work.

      Weaknesses: 

      The only major weakness was the limited number of isolates from certain lineages and the over-representation others, which was also acknowledged by the authors. However, since the case is made that the MTBC has a closed pangenome, the inclusion of additional genomes would not result in the identification of any new genes. This is a strong statement without an illustration/statistical analysis to support this. 

      The language around open and closed pangenomes is difficult to convey and indeed we will improve this for the next version. We aimed to show that with a set of highly curated genomes that span the breadth of known diversity within the MTBC, we see no evidence for a large, open pangenome as has been previously suggested. We instead suggest that adding new genomes is unlikely to bring large additions to the accessory genome, therefore showing that the MTBC pangenome tends towards being closed. We will add additional visualisations such as gene accumulation plots to better support this argument.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (Inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B).

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We measured the viability of C2C12 cells after 24 hours of treatment with 15d-PGJ2 using the MTT assay and observed that the viability of cells was decreased after treatment with 15d-PGJ2 (10 µM) but not with 15d-PGJ2 (1 µM, 2 µM, 4 µM, or 5 µM) (see Fig. S2A of the updated manuscript). The results and figures of the manuscript have been updated accordingly.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death.

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ­2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears.

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15d-PGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15d-PGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation.

      The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      (1) I still think the novelty is limited by previous published findings. The authors themselves noted that the accumulation of 15d-PGJ2 in senescent cells has been reported in various cell types, including human fibroblasts, HEPG2 hepatocellular carcinoma cells, and HUVEC endothelial cells (PMCID: PMC8501892). Although the current study observed similar activation of 15d-PGJ2 in myoblasts, it appears to be additive rather than fundamentally novel. The covalent adduct of 15d-PGJ2 with Cys-184 of H-Ras was reported over 20 years ago (PMID: 12684535), and the biochemical principles of this interaction are likely universal across different cell types. The regulation of myogenesis by both HRas and 15d-PGJ2 has also been previously extensively reported (PMID: 2654809, 1714463, 17412879, 20109525, 11477074). The main conceptual novelty may lie in the connection between these points in myoblasts. But as discussed in another comment, the use of C2C12 cells as a model for senescence study is questionable due to the lack of the key regulator p16. The findings in C2C12 cells may not accurately represent physiological-relevant myoblasts. It is recommended that these findings be validated in primary myoblasts to strengthen the study's conclusions.

      This is the first study to show a molecular mechanism where activation of HRas signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of HRas inhibits the differentiation of skeletal myoblasts.

      (2) The C2C12 cell line is not an ideal model for senescence study.

      C2C12 cells are a well-established model for studying myogenesis. However, their suitability as a model for senescence studies is questionable. C2C12 cells are immortalized and do not undergo normal senescence like primary cells as C2C12 cells are known to have a deleted p16/p19 locus, a crucial regulator of senescence (PMID: 20682446). The use of C2C12 cells in published studies does not inherently validate them as a suitable senescence model. These studies may have limitations, and the appropriateness of the C2C12 model depends on the specific research goals.

      Several reports have shown that cells undergo senescence independent of p16 expression. MCF7 human breast adenocarcinoma cells have been shown to undergo DNA damage mediated and Oncogene induced senescence as seen after treatment with Doxorubicin (PMID: PMC7025418) and expression of constitutively active HRas (PMID: 17135242), despite the homozygous deletion of p16 locus (ISBN 9780124375512 Chapter 17 Table 2) by upregulation of cell cycle inhibitor protein p21. In this study, we observe an increase in the senescence markers in C2C12 cells after treatment with Doxo (Fig. 1). We also observed an increase in the markers of DNA damage-mediated senescence in MCF7 after treatment with Doxo (Data will be included in the revised manuscript). Based on these observations, we have concluded that C2C12 cells undergo senescence despite lacking the p16/p19 locus.

      In the study by Moustogiannis et al. (PMID: 33918414), they claimed to have aged C2C12 cells through multiple population doublings. However, the SA-β-gal staining in their data, which is often used to confirm senescence, showed almost fully confluent "aged" C2C12 cells. This confluent state could artificially increase SA-β-gal positivity, suggesting that these cells may not truly represent senescence. Moreover, the "aged" C2C12 cells exhibited normal proliferation, which contradicts the definition of senescence. Similar findings were reported in another study of C2C12 cells subjected to 58 population doublings (PMID: 21826704), where even at this late stage, the cells were still dividing every 2 or 3 days, similar to younger cells at early passages. More importantly, I do know how the p16 was detected in that paper since the locus was already mutated. In terms of p21, there was no difference in the proliferative C2C12 cells at day 0.

      In the study by Moiseeva et al. in 2023 (PMID: 36544018), C2C12 cells were used for senescence modeling for siRNA transfection. However, the most significant findings were obtained using primary satellite cells or confirmed with complementary data.

      In conclusion, while molecular changes observed in studies using C2C12 cells may be valid, the use of primary myoblasts is highly recommended for senescence studies due to the limitations and questionable senescence characteristics of the C2C12 cell line.

      (3) Regarding source of increased PGD in the conditioned medium, I want to emphasize that it's unclear whether the PGD or its metabolites increase in response to DNA damage or the senescence state. Thus, using a different senescent model to exclude the possibility of DNA damage-induced increase will be crucial.

      Though Senescence can be induced by several stress stimuli like DNA damage, Oncogene expression, ROS, Mitochondrial Dysfunction, etc., DNA damage remains critical for the induction of the SASP (reviewed in PMID: 20078217). Also, other models of senescence, like Oncogene Induced Senescence (reviewed in PMID: 17671427), ROS Induced Senescence (PMID: 24934860), Mitochondrial Dysfunction Associated Senescence (MiDAS) (PMID: 26686024) have shown upregulation of DNA damage-associated signaling pathways. In this study, we have explored the SASP of cells undergoing senescence upon chemotherapy drug Doxorubicin-mediated DNA damage.

      (4) Similarly for the in vivo Doxorubicin (Doxo) injection, both reviewers have raised concerns about the potential side effects of Doxo, including inflammation, DNA damage, and ROS generation. These effects could potentially confound the results of the study. The physiological significance of this study will heavily rely on the in vivo data. However, the in vivo senescence component is confounded by the side effects of Doxo.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (5) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of conditioned medium. The author took it for granted that the conditioned medium from senescent cells would inhibit myogenesis, relying on previous publications (PMID: 37468473). However, that study was conducted in the context of myotonic dystrophy type 1. To support the inhibitory effect in the current experimental settings, direct evidence is required. It would be necessary to include another control with conditioned medium from normal, proliferative C2C12 cells.

      Conditioned medium of senescent cells of several types, like senescent myoblasts in case of DM1 (PMID: 37468473), adipocytes undergoing senescence due to H2O2 treatment, Insulin Resistance, and Replicative senescence (PMID: 37321332), has been shown to inhibit the differentiation of myoblasts. Therefore, in this study, we measured the effect of prostaglandin PGD2 and its metabolites on the differentiation of myoblasts by inhibiting the biosynthesis of PGD2 in senescent myoblasts by treatment with AT-56. We inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment.

      (6) Statistical analyses problems.

      Only t-test was used throughout the study even when there are more than two groups. Please have a statistician to evaluate the replicates and statistical analyses used.

      In experiments with more than two groups, the t-test was used for column-wise comparison of the experiment samples to the control sample. Multiple sample comparisons using one-way or two-way ANOVA were avoided as experimental samples were individually compared to the control sample.

      For the 15d-PGJ2/cell concentration measurements in Figure 1F, there were only two replicates, which was provided in the supplementary table after required. Was that experiment repeated with more biological replicates?

      Additional replicates of the experiment will be included in the revised manuscript.

      For figure 1C, Fig 1F, 1G, 1J, 2C, 2E, 3A, 3E, 3F, 4D, 4E, please include each data points in bar graphs as used in Fig 1D, or at least provide how many biological replicates were used for each experiment?

      Appropriate revisions will be made in the figure legends of the revised manuscript.

      There is no error bar in a lot of control groups (Fig 2C, 2E, 3EF, 4E, S4B).

      There are no error bars for the control groups in the figures 2C, 2E, 3E, 3F, 4E, and S4B as the experimental samples of each replicate were normalized to the corresponding control sample, rendering the values for the control sample of each replicate to 1.

      For qPCR data in Figure 1C, the author responded in that the data in was plotted using 2-ΔCT instead of 2-ΔΔCT to show the variability in the expression of mRNAs isolated from animals treated with Saline. This statement does not align with the method section. Please revise.

      Appropriate revisions will be made to the method sections of the revised manuscript.

      (7) For Figure 1, the title may not be appropriate as there is insufficient data to support the inhibition of myoblast differentiation.

      Appropriate revisions will be made to the revised manuscript.

      Recommendations for the authors:

      After careful review, the editors advise you to carefully address the following concerns.

      (1) There were concerns that in the revised manuscript, the DMSO and Doxo experiments depicted in Figure 1H appeared quite homogenous despite the author's description to the contrary. This leads to concerns about the type of statistics employed and the possible low number of replicates of experiments shown in Fig. 1.

      (2) Experiments in Figure 1F, 1I, and 1J had as few as n=2 experiments. Figures 1C, 1D, 1F, 1G, and 1J, the statistics used a two-tailed student's t-test; for all other experiments, they marked N/A for statistics. Using a t-test for multi-group comparisons (as indicated in the figure legend) and relying on only 2 replicates for many experiments are not appropriate.

      Additional replicates for the experiments shown in figures 1F, 1I, and 1J have been done and the data will be revised along with updated statistical tests during the revision of the manuscript.

      (3) In several experiments, the difference between technical replicates is too high.

      Reviewer #1 (Recommendations For The Authors):

      Most of my concerns were addressed in the revised manuscript.

      We thank the reviewer for their time in reviewing the manuscript and consideration of the author’s response to their comments in during the previous round of review.

      Reviewer #2 (Recommendations For The Authors):

      Validating the findings in a primary myoblast is highly recommended for senescence studies due to the limitations and questionable senescence characteristics of the C2C12 cell line.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      Validate the finding in a different senescent model to exclude the possibility of DNA damage-response.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      For Fig 2A, add another control with a conditioned medium from normal, proliferative C2C12 cells.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      Please have a statistician to evaluate the replicates and statistical analyses used.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      For the barplots (figure 1C, Fig 1F, 1G, 1J, 2C, 2E, 3A, 3E, 3F, 4D, 4E), please include each data points, or at least provide how many biological replicates were used for each experiment.

      Appropriate revisions will be made in the figure legends of the revised manuscript.

      For Figure 1, the title may not be appropriate as there is insufficient data to support the inhibition of myoblast differentiation.

      Appropriate revisions will be made to the revised manuscript.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides useful information about the lipid metabolite 15d-PGJ2 as a potential regulator of myoblast senescence. The authors provide experimental evidence that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas. However, the manuscript is incomplete in its current form, as it lacks robust support from the data regarding the main conclusions related to senescence and technical concerns related to the senescence models used in this study.

      We are grateful to the editors and the reviewers for their time and comments in sharpening the science and the writing of the manuscript. We have attached a detailed response to emphasize that the manuscript does include robust evidence regarding the claims, which could have been missed during the review process. We have provided a better context for these points now.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B).

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We measured the viability of C2C12 cells after 24 hours of treatment with 15d-PGJ2 using the MTT assay and observed that the viability of cells was decreased after treatment with 15d-PGJ2 (10 µM) but not with 15d-PGJ2 (1 µM, 2 µM, 4 µM, or 5 µM) (see Fig. S2A of the updated manuscript). The results and figures of the manuscript have been updated accordingly.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death. 

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears. 

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15dPGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15dPGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation.

      The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      The novelty of the study is compromised as the activation of PGD and 15d-PGJ2, as well as the regulation of HRas and cell proliferation, have been previously reported. 

      Literature does not support this statement, and it is important to clarify this misimpression for the field as a whole. 

      Let us clarify- 

      Covalent modification of HRas by 15d-PGJ2 has been reported only twice in the literature(Luis Oliva et al., 2003; Yamamoto et al., 2011) in fibroblasts and neurons respectively. 

      Interaction between Hras and 15d-PGJ2 in skeletal muscles has not been shown before, even though both Hras and 15d-PGJ2 are shown to be key regulators of muscle homeostasis. 

      Activation of Hras by 15d-PGJ2 was reported first by Luis Oliva et al (Luis Oliva et al., 2003). However, this study does not comment on the functional implications of activation of Hras signaling. 

      Recently, our lab contributed to a study where the functional implication of activation of Hras signaling due to covalent modification by 15d-PGJ2 was shown in the maintenance of senescence phenotype (Wiley et al., 2021). 

      15d-PGJ2 was shown to inhibit the differentiation of myoblasts by Hunter et al (Hunter et al., 2001). This study hypothesized that the inhibition of myoblast differentiation is via 15d-PGJ2 mediated activation of the PPARγ signaling, the study also showed inhibition of myoblast differentiation independent of PPARγ activity, suggesting the presence of other mechanisms.

      This is the first study to show a molecular mechanism where activation of Hras signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of Hras inhibits the differentiation of skeletal myoblasts.

      Additionally, there are major technical concerns related to the senescence models, limiting data interpretation regarding the relevance to senescent cells.

      Major concerns:

      (1) The C2C12 cell line is not an ideal model for senescence study due to its immortalized nature and lack of normal p16 expression. A more suitable myoblasts model is recommended, with a more comprehensive characterization of senescence features.

      C2C12 is a good model for DNA damage-based senescence that is used in this manuscript. Several reports in the literature have shown the induction of senescence in C2C12 cells. Moiseeva et al 2023 show induction of senescence in C2C12 cells after etoposide-mediated DNA damage. Moustogiannis et al 2021 show the induction of replicative senescence in C2C12 cells. In this study, we show that C2C12 cells undergo DNA damage-mediated senescence after treatment with Doxo. We measured the induction of senescence in C2C12 cells upon DNA damage using several physiological (Nuclear Size, Cell Size, and SA β-gal) and molecular markers (mRNA levels of p21 and SASP factors (IL6 and TGFβ), protein levels of p21) of senescence (see Fig. 1 of the updated manuscript). The results and the figures in the manuscript have been updated accordingly.

      (2) The source of increased PGD or its metabolites in the conditioned medium is unclear. Including other senescence models, such as replicative or oncogeneinduced senescence, would strengthen the study.

      Fig. 1E shows time-dependent increase in the expression of PGD2 biosynthetic enzymes in senescent C2C12 cells. Fig. 1F shows an increase in the levels of 15dPGJ2 secreted by senescent C2C12 cells in the conditioned medium. This data shows that senescent C2C12 cells are the source of PGD and its metabolites in the conditioned medium.

      Again, C2C12 is not suitable for replicative senescence due to its immortalized status.

      We and others have shown that C2C12 cells undergo senescence, and this manuscript only used DNA damage induced senescence.

      (3) In the in vivo part, it is unclear whether the increased expression of PTGS1, PTGS2, and PTGDS is due to senescence or other side effects of DOXO.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (4) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of a conditioned medium.

      Figure 2A tests the effect of prostaglandin PGD2 and its metabolites secreted by the senescent cells on the differentiation of myoblasts. Therefore, we inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment, whereas differentiation of C2C12 cells without any treatment serves as a positive control.

      There is no explanation of how differentiation was quantified or how the fusion index was calculated.

      The fusion index was calculated using a published myotube analyzer software (Noë et al., 2022). Appropriate information has been added to the materials and methods section of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 3: Expand SA in "SA β-gal".

      The manuscript has been updated accordingly (See line 3).

      Line 68: HRas is highly regulated by lipid modifications.

      The manuscript has been updated accordingly (See line 67).

      Figures

      Figure S1A seemed incomplete (maybe some processing issue).

      The Figure has been updated in the revised manuscript (See Fig. S1A).

      Figure S1B-H are mislabeled.

      The figure has been updated in the revised manuscript (See Fig. S1C, D, E, and F).

      Figures S1E-H are not mentioned in the manuscript.

      The manuscript has been updated accordingly (See line 120).

      Many supplementary figures are not cited in the article.

      The manuscript has been updated accordingly. (See lines 85, 120, 123, 166, 225, 356, 364, 412, and 413)

      Reviewer #2 (Recommendations For The Authors):

      (1) Clarify the injection method for Doxorubicin in B6J mice on line 83 (IP or IM).

      Mice were injected intraperitoneally with Doxorubicin (as mentioned in the materials and methods, see lines 83 and 794)

      (2) Address missing information in figures or figure legends.

      There is missing piece in Sup Fig 1A.

      The figure has been updated in the revised manuscript (See Fig. S1A).

      Correct labels in Sup Fig 1C and 1D.

      The figure has been updated in the revised manuscript (See Fig. S1C, D, E, and F).

      How would the authors explain the dramatic differences in the morphology of C2C12 cells treated with DOXO between bright field and SA-beta-gal staining images in Sup Fig 1B and 1C.

      The SA β-gal image after treatment with Doxo does show a flattened cell morphology. Another field of view from the same experiment has been added in the figure to show the difference in the cell morphology more prominently in the revised manuscript (See Fig. 1H).

      Provide explanations for Sup Fig 1E-1G, including the meaning of the y-axis and the blue dots and red lines.

      We have provided an explanation for the multiple reaction monitoring mass spectrometry used to measure the concentration of 15d-PGJ2 in the conditioned medium in the revised manuscript (see lines 119-130 and the legends of Fig. S1C, D, and E)

      (3) Please review the calculation of qPCR data in Figure 1C for correctness, ensuring reference samples with an average expression level of 1.

      The data in Fig. 1C was plotted using 2-ΔCT instead of 2-ΔΔCT to show the variability in the expression of mRNAs isolated from animals treated with Saline.

      (4) Please explain the calculation of 15d-PGJ2/cell concentration in Figure 1F and provide raw data for review, considering the substantial changes and small error bars. The method or result section lacks an explanation of how this calculation was performed. Additionally, there is no mention of the cell number count.

      All the raw values (concentration of 15d-PGJ2 measured using mass spec and cell numbers counted at the time of collection of conditioned medium) are provided in the supplementary table 1. The standard curve to calculate the concentration of 15dPGJ2 in the conditioned medium is shown in Fig. S1F. The cell number was counted after trypsinization using a hemocytometer on the day of collection of the conditioned medium.

      (5) Please clarify how cell number normalization and doubling time calculation were done in Fig 2B. Consider replacing the figure with a growth curve showing confluence on the y-axis for easier interpretation.

      Cells were counted every 24 hours and the normalization was done to the number of cells counted on day 0 of the treatment (to consider attaching efficiency and other cell culture parameters). Doubling time was calculated as the reciprocal of the slope of the graph of log2(normalized cell number) vs time.

    1. Author response:

      Please find below our provisional author response, outlining the revisions we plan to undertake to address the Recommendations received:

      Reviewer #1 (Recommendations For The Authors):

      (1) A set of recent advances have shown that embeddings of unsupervised/self-supervised speech models aligned to auditory responses to speech in the temporal cortex (e.g. Wav2Vec2: Millet et al NeurIPS 2022; HuBERT: Li et al. Nat Neurosci 2023; Whisper: Goldstein et al. bioRxiv 2023). These models are known to preserve a variety of speech information (phonetics, linguistic information, emotions, speaker identity, etc) and perform well in a variety of downstream tasks. These other models should be evaluated or at least discussed in the study.

      We plan to evaluate two of these other models, Wav2Vec2 and HuBERT, in the brain encoding and RSA parts.

      (2) The test statistics of the results in Fig 1c-e need to be revised. Given that logistic regression is a convex optimization problem typically converging to a global optimum, these multiple initializations of the classifier were likely not entirely independent. Consequently, the reported degrees of freedom and the effect size estimates might not accurately reflect the true variability and independence of the classifier outcomes. A more careful evaluation of these aspects is necessary to ensure the statistical robustness of the results.

      We plan to address this point to ensure the statistical robustness of our results.

      (3) In Line 198, the authors discuss the number of dimensions used in their models. To provide a comprehensive comparison, it would be informative to include direct decoding results from the original spectrograms alongside those from the VLS and LIN models. Given the vast diversity in vocal speech characteristics, it is plausible that the speaker identities might correlate with specific speech-related features also represented in both the auditory cortex and the VLS. Therefore, a clearer understanding of the original distribution of voice identities in the untransformed auditory space would be beneficial. This addition would help ascertain the extent to which transformations applied by the VLS or LIN models might be capturing or obscuring relevant auditory information.

      We plan to include direct decoding results from the original spectrograms in addition from the VLS and LIN models.

      Reviewer #2 (Recommendations For The Authors):

      We plan to address the following points raised by Reviewer #2:

      (1) English mistakes, rewordings:

      a. L31: 'in voice' > consider rewording (from a voice?).

      b. L33: consider splitting sentence (after interactions).

      c. L39: 'brain' after parentheses.

      d. L45-: certainly DNNs 'as a powerful tool' extend to audio (not just image and video) beyond their use in brain models.

      e. L52: listened to / heard.

      f. L63: use second/s consistently.

      g. L64: the reference to Figure 5D is maybe a bit confusing here in the introduction.

      h. L79-88: this section is formulated in a way that is too detailed for the introduction text (confusing to read). Consider a more general introduction to the VLS concept here and the details of this study later.

      i. L99-: again, I think the experimental details are best saved for later. It's good to provide a feel for the analysis pipeline here, but some of the details provided (number of averages, denoising, preprocessing), are anyway too unspecific to allow the reader to fully follow the analysis.

      We will correct the mistakes, apply the suggested rewordings, and clarify the points raised.

      (2) Clarification.

      • L159: what was the motivation for classifying age as a 2-class classification problem? Rather than more classes or continuous prediction? How did you choose the age split?

      • L263: Is the test of RDM correlation>0 corrected for multiple comparisons across ROIs, subjects, and models?

      • L379: 'these stimuli' - weren't the experimental stimuli different from those used to train the V/AE?

      • L443: what are 'technical issues' that prevented subject 3 from participating in 48 runs??

      • L444: participants were instructed to 'stay in the scanner'!? Do you mean 'stay still', or something?

      • L463: Hearing thresholds of 15 dB: do you mean that all had thresholds lower than 15 dB at all frequencies and at all repeated audiogram measurements?

      • L472: were the 4 category levels balanced across the dataset (in number of occurrences of each category combination)?

      • L482: the test stimuli were selected as having high energy by the amplitude envelope. It is unclear what this means (how is the envelope extracted, what feature of it is used to measure 'high energy'?)

      • L500 was the audio filtered to account for the transfer function of the Sensimetrics headphones?

      • L500: what does 'comfortable level' correspond to and was it set per session (i.e. did it vary across sessions)?

      • L526- does the normalization imply that the reconstructed spectrograms are normalized? Were the reconstructions then scaled to undo the normalization before inversion?

      • L606: does the identity GLM model the denoised betas from the first GLM or simply the BOLD data? The text indicates the latter, but I suspect the former.

      • L704: could you unpack this a bit more? It is not easy to see why you specify the summing in the objective. Shouldn't this just be the ridge objective for a given voxel/ROI? Then you could just state it in matrix notation.

      • L716: you used robust scaling for the classifications in latent space but haven't mentioned scaling here. Are we to assume that the same applies?

      • L720: Pearson correlation as a performance metric and its variance will depend on the choice of test/train split sizes. Can you show that the results generalize beyond your specific choices? Maybe the report explained variance as well to get a better idea of performance.

      • Could you specify (somewhere) the stimulus timing in a run? ISI and stimulus duration are mentioned in different places, but it would be nice to have a summary of the temporal structure of runs.

      We will clarify the points raised.

      Reviewer #3 (Recommendations For The Authors):

      We plan to address the following points raised by Reviewer #3:

      Comments:

      • Code and data are not currently available.

      • In the supplementary material, it would be beneficial to present the different analyses as boxplots, as in the main text, but with the ROIs in the left and right hemispheres separated, to better show potential hemispheric effect. Although this information is available in the Supplementary Tables, it is currently quite tedious to access it.

      • In Figure 3a, it might be beneficial to order the identities by age for each gender in order to more clearly illustrate the structure of the RDMs,

      • In Figure 3b, the variance for the correlations for the aTVA is higher than in other regions, why?

      • Please make sure that all acronyms are defined, and that they are redefined in the figure legends.

      • Gender and age are primarily encoded by different brain regions (Figure 5, pTVA vs aTVA). How does this finding compare with existing literature?

      We will upload the code and the preprocessed data; improve the supplementary material figures; Fix Figure 3 according to the Reviewer’s suggestion, and clarify the points raised.

    1. Author response:

      We thank the reviewers for their comments and will revise the manuscript to provide more comprehensive clarifications to aide readers’ understanding of behaviorMate. Additionally, we intend to take several steps which could provide further insights and improve the ease of use for new behaviorMate users: (1) to release an expanded and annotated library of existing settings and VR scene files, (2) improve the online documentation of context lists and decorators which allow behaviorMate to run custom experimental paradigms without writing code, and (3) release online API details of the JSON messaging protocol that is used between behaviorMate, the Arduinos, and the VRMate program which could be especially helpful to developers interested in expanding or modifying the system. Here we provide a few brief points of clarification to some of the concerns raised by the reviewers.

      Firstly, we clarify the system’s focus on modularity and flexibility. behaviorMate leverages the “Intranet of Things” framework to provide a low-cost platform that relies on asynchronous message passing between independent networked devices. While our current VR implementation typically involves a PC, 2 Arduinos, and an Android device per VR display, the behaviorMate GUI can be configured without editing any source code to listen on additional ports for UDP messages which will be automatically timestamped and logged. Since the current implementation of the behaviorMate GUI can be configured through the settings file to send and receive JSON-formatted messages on arbitrary ports, third-party devices could be configured to listen and respond to these messages also without editing the UI source code. More specialized responsibilities or tasks that require higher temporal precision (such as position tracking) are handled by dedicated circuits so as to not overload the general purpose one. This provides a level of encapsulation/separation of concerns since components can be optimized for performance of a single tasks—a feature that is especially desirable given resource limitations on the most common commercially available microcontrollers.

      A number of methods exist for synchronizing recording devices like microscopes or electrophysiology recordings with behaviorMate’s time-stamped logs of actuators and sensors. For example, the GPIO circuit can be configured to send sync triggers, or receive timing signals as input, alternatively a dedicated circuit could record frame start signals and relay them to the PC to be logged indecently of the GPIO (enabling a high-resolution post-hoc alignment of the time stamps). The optimal method to use varies based on the needs of the experiment. For example, if very high temporal precision is needed, such as during electrophysiology experiments, a high-speed data acquisition (DAQ) circuit to capture a fixed interval readout might be beneficial. behaviorMate could still be set up as normal to provide closed and open-loop task control at behaviorally relevant timescales alongside a DAQ circuit recording events at a consistent temporal resolution. While this would increase the relative cost of the recording setup, identical rigs for training animals could still be configured without the DAQ circuit avoiding the additional cost and complexity.

      VRMate provides the interface between Unity and behaviorMate—therefore using the two systems together mean that no Unity or C# programming is necessary. VRMate provides a prespecified set of visual cues that can be scaled in 3 dimensions and have textures applied to them, permitting a wide variety of different scenes to be displayed. All VRMate scene details are additionally logged by behaviorMate to allow for consistency checks across experiments. The VRMate project also includes “editor scripts” that provide a drag-and-drop utility in Unity Editor for developing new scenes. Since the details pertaining to specific scenes and view angle are loaded at runtime via JSON-formatted UDP messages, it is not necessary to recompile VRMate in order to use this feature. Since we send individual position updates to VRMate from the PC, any issues with clock drift would be limited to the refresh rate of the Unity program that fast enough to be perceived as instantaneous and we have thoroughly tested the timing differences between displays using high-speed cameras and found them to be negligible. While we find using 5 separate Android computers to render scenes as described an optimal solution to maximize flexibility, it would also be possible to render all scenes on a single PC to further mitigate this concern depending on experimental demands. Finally, our treadmill implementations of behaviorMate use no monitor displays, however due to the modular design of behaviorMate virtual cues could be seamlessly added by added to any such setup by a VR context to the settings files.

      One last point to mention is that while our project is not affected by the recent changes in pricing structure of the Unity project, since the compiled software does not need to be regenerated to update VR scenes, or implement new task logic since this is handled by the behaviorMate GUI. This means the current state of the VRMate program is robust to any future pricing changes or other restructuring of the Unity program and does not rely on continued support of Unity. Additionally, the solution presented in VRMate has many benefits, however, a developer could easily adapt any open-source VR Maze project to receive the UDP-based position updates from behaviorMate or develop their own novel VR solutions. We intend to update the VR section of the manuscript to make all of this information clearer in the document as well as to provide the additional online documentation in the materials linked in the supplemental information.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The present paper introduces Oscillation Component Analysis (OCA), in analogy to ICA, where source separation is underpinned by a biophysically inspired generative model. It puts the emphasis on oscillations, which is a prominent characteristic of neurophysiological data.

      Strengths:

      Overall, I find the idea of disambiguating data-driven decompositions by adding biophysical constrains useful, interesting and worth-pursuing. The model incorporates both a component modelling of oscillatory responses that is agnostic about the frequency content (e.g., doesn’t need bandpass filtering or predefinition of bands) and a component to map between sensor and latent space. I feel these elements can be useful in practice.

      Thank you for the positive evaluation!

      Weaknesses:

      Lack of empirical support: I am missing empirical justification of the advantages that are theoretically claimed in the paper. I feel the method needs to be compared to existing alternatives.

      Thank you for bringing up this important issue.  We agree that a direct performance comparison would be important to demonstrate.  We performed additional analyses to compare OCA with ICA and one easy frequency domain exploratory technique in both simulated and real human data (see Section How does OCA compare to conventional approaches? and Supporting Text: Comparison of OCA to traditional approaches in experimental EEG data).  The results of the simulated data are shown in the revised Figure 3.  Although the slow and alpha oscillations in this simulation are statistically independent under the generative model, ICA identifies components that mix these independent signals, as one would expect based on the above discussion (i.e., all components are Gaussian).  Meanwhile, OCA is able to recover distinct slow and alpha components.  We repeated this analysis in real human EEG during propofol-induced unconsciousness and found a similar result where ICA produced components that mixed slow and alpha band signals whereas OCA identified distinct oscillatory components (see Figure S4.1).

      Reviewer #1 (Recommendations For The Authors):

      Major

      Theoretical justification. About the limitation of ICA In M/EEG, lines 24-28 seem to suggest that, almost by necessity (if Gaussianity approximately holds as argued), ICA doesn’t work on these modalities. But a body of work indicates that it does work to a reasonable extent, and that it is useful in practice; see https://www.pnas.org/doi/pdf/10.1073/pnas.1112685108?download=true. How then this theoretical claim be reconciled with the empirical evidence suggesting otherwise? I am putting this as a major comment because the limitations of ICA are one of the main motivations for this work, so it needs to be well-justified.

      Thanks for bringing this forward this important point and for suggesting the reference Brookes, et al. Their work actually supports our claim. In the fifth paragraph of the discussion section, Brookes, et al. states “ICA has been used previously and extensively for artifact rejection in MEG; however, its use in identification of oscillatory signals has remained limited. This limitation is likely due to its susceptibility to interference and the fact that amplitude-modulated oscillatory signals exhibit a largely Gaussian statistical distribution (and ICA relies on non-Gaussianity in recovered sources).” For this reason, they use the Hilbert envelope as the input to the ICA procedure rather than the original time-series. These Hilbert envelopes represent the instantaneous amplitude of neural oscillatory activity, i.e., they follow the amplitude modulation of the oscillatory activity. The method does not extract any oscillatory activity or disambiguate different oscillatory sources, but only assess the connectivity pattern within pre-defined bands, i.e., how different areas of the brain are harmonized through modulation of the oscillations or vice-versa inside those pre-defined bands. The paper did not show extracted independent time signals (tICs), focusing instead on the spatial pattern that these tICs activated. In that way, their use of ICA was totally justified.  Overall, our assessment of the limitations of ICA are very well aligned with Brookes, et al. We have added the against our claim in the introduction (see page 3 line 23) and revised the discussion section to refer to this paper (see page 21 lines 426-432).

      Empirical justification. The synthetic example is good, but I’m not quite sure what to make out of the real data examples. One can see reasonable spectra in the different bands and not-soeasy to interpret spatial topologies. But the main question is how OCA compares to more standard, easier approaches. Could the authors show explicitly how the benefits that were spelled out in the introduction/discussion manifest in practice, when compared to other methods?

      Thank you for bringing up this important issue.  We agree that a direct performance comparison would be important to demonstrate. We performed additional analyses to compare OCA with ICA and one easy frequency domain exploratory technique in both simulated and real human data (see Section How does OCA compare to conventional approaches? and Supporting Text: Comparison of OCA to traditional approaches in experimental EEG data).  The results of the simulated data are shown in the revised Figure 3 in page 12. Although the slow and alpha oscillations in this simulation are statistically independent under the generative model, ICA identifies components that mix these independent signals, as one would expect based on the above discussion (i.e., all components are Gaussian).  Meanwhile, OCA is able to recover distinct slow and alpha components. We repeated this analysis in real human EEG during propofol-induced unconsciousness and found a similar result where ICA produced components that mixed slow and alpha band signals whereas OCA identified distinct oscillatory components (see Figure S4.1 in Supporting Text: Comparison of OCA to traditional approaches in experimental EEG data).

      Minor

      "a recently-described class of state-space models" -> of the three references, one is from the sixties, another from the eighties, and the last one is 21 years old. Is this really a recent idea?

      Maybe rephrase "recently-described", or else think of more recent references that bring something new?

      We have amended the wording as suggested. (See page 4, line 53)

      Lines 72-74. It might be useful to unwrap in *intuitive* terms why the elements of this vector are closely related to the real and imaginary parts of the analytic signal.

      Thanks for the helpful comment. The sentence now reads:

      “These elements of this state vector traces out two time-series that maintains an approximate π/ 2 radian phase difference and therefore are closely related to the real and imaginary parts of an analytic signal…”. (See page 5, lines 72-75)

      Also, relatedly, I don’t seem to have access to the SI which is supposed to explain this. It doesn’t show up in the BiorXiv preprint either.

      We are sorry to hear that. BiorXiv merges all the supporting information and posts them under the Supplementary Material.

      In Eq(1) should it be R(f) instead of R(2 \pi f / f_s) ?

      Thank you for catching this typo.

      As I understand from lines 182-195, the input for the method is not channels but PCA components. Since R is learned, presumably the variance of the lower-order PCs (i.e. the latest elements of the diagonal of R) will estimated to be small. This, in turn, would make the likelihood to be heavily weighed on these components (because one basically divides their contribution by their variance). Would this potentially bias the estimation towards these lower-order PCs, at the expense of higher-order PCs. In a different context, this is shown here: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008580 Maybe it would be worth commenting on this?

      We agree with reviewer’s initial observations but disagree with the assessment. Our loglikelihood calculation reweights the components appropriately to counter the weighting coming due to spatial whitening, thus negating the above-mentioned bias. The main contribution of the spatial whitening and PCA are to make the learning numerically stable, i.e., it does not encounter underflow or overflow in the iterative steps. We also note that this spatial whitening, and the PCA are also reverted at the end to obtain the spatial components and estimated noise covariance. So, as long as we use all the components with strictly positive variances, we will not bias the log-likelihood one way or other.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      The study answers the important question of whether the conformational dynamics of proteins are slaved by the motion of solvent water or are intrinsic to the polypeptide. The results from neutron scattering experiments, involving isotopic labelling, carried out on a set of four structurally different proteins are convincing, showing that protein motions are not coupled to the solvent. A strength of this work is the study of a set of proteins using spectroscopy covering a range of resolutions, however, it suffers from some scholarly shortcomings and limited discussion of results. The work is of broad interest to researchers in the fields of protein biophysics and biochemistry.

      Reply 1: We thank the editors and reviewers for the positive and encouraging comments.

      Reviewer #1 (Public Review):

      Summary:

      Zheng et al. study the 'glass' transitions that occur in proteins at ca. 200K using neutron diffraction and differential isotopic labeling (hydrogen/deuterium) of the protein and solvent. To overcome limitations in previous studies, this work is conducted in parallel with 4 proteins (myoglobin, cytochrome P450, lysozyme, and green fluorescent protein) and experiments were performed at a range of instrument time resolutions (1ns - 10ps). The author's data looks compelling, and suggests that transitions in the protein and solvent behavior are not coupled and contrary to some previous reports, the apparent water transition temperature is a 'resolution effect'; i.e. instrument response is limited. This is likely to be important in the field, as a reassessment of solvent 'slaving' and the role of the hydration shell on protein dynamics should be reassessed in light of these findings.

      Strengths:

      The use of multiple proteins and instruments with a rate of energy resolution/ timescales.

      Reply 2: We thank the reviewer for highlighting our key findings.

      Weaknesses:

      The paper could be organised to better allow the comparison of the complete dataset collected. The extent of hydration clearly influences the protein transition temperature. The authors suggest that "water can be considered here as lubricant or plasticizer which facilitates the motion of the biomolecule." This may be the case, but the extent of hydration may also alter the protein structure.

      Reply 3: Following the reviewer’s suggestion, we studied the secondary structure content and tertiary structure of CYP protein at different hydration levels (h = 0.2 and 0.4) through molecular dynamics simulation. As shown in Table S2 and Figure S6, the extent of hydration does not alter the protein secondary structure content and overall packing. Thus, this result also suggests that water molecules have more influence on protein dynamics than on protein structure.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript entitled "Decoupling of the Onset of Anharmonicity between a Protein and Its Surface Water around 200 K" by Zheng et al. presents a neutron scattering study trying to elucidate if at the dynamical transition temperature water and protein motions are coupled. The origin of the dynamical transition temperature has been highly debated for decades, specifically its relation to hydration.

      Strengths:

      The study is rather well conducted, with a lot of effort to acquire the perdeuterated proteins, and some results are interesting.

      Reply 4: We thank the reviewer for highlighting our key findings.

      Weaknesses:

      The present work could certainly contribute some arguments, but I have the feeling that not all known facts are properly discussed.

      The points the authors should carefully discuss are the following:

      (1) Daniel et al. (10.1016/S0006-3495(98)77694-5) have shown that enzymes can be functional below the dynamical transition temperature which is at odds with some of the claims of the authors.

      Reply 5: Following the reviewer’s suggestion, we added the following paragraph into the Introduction into the revised main text.

      “Although exceptions have been reported (Biophys. J. 1998, 75, 2504.), the dynamical transition has been linked to the thermal onset of function in a number of proteins, e.g, myoglobin (Biochemistry, 1975, 14, 5355-5373), ribonuclease (Nature, 1992, 357, 423-424.), elastase ( Biochemistry, 1994, 33, 9285-9293.) and bacteriorhodopsin (PNAS, 1993, 90, 9668-9672.), all of which become inactive below the dynamical transition temperature.”

      (2) It is not as easy to say that protonated proteins in D2O reflect protein dynamics while perdeuterated proteins in H2O reflect water dynamics. A recent study by Nidriche et al. (PRX LIFE 2, 013005 (2024)) reveals that H <-> D exchange is much faster than usually assumed and has important consequences for such studies.

      Reply 6: For the sample preparation, all the H-proteins were dissolved in D2O to allow full deuterium exchange of all exchangeable hydrogen atoms and then lyophilized for 12 hours to obtain the dry sample. The lyophilized H-protein is then put into a desiccator with D2O, placed in the glove box purged with nitrogen gas, to absorb D2O till the desired hydration level, h (gram water/gram protein). In contrast, the preparation of the deuterated proteins was conducted in the opposite way. The D-proteins were dissolved in H2O to allow full hydrogen exchange of all exchangeable deuterium atoms and then lyophilized for 12 hours to obtain the dry sample. The lyophilized D-protein is then put into a desiccator with H2O to absorb H2O till the desired h. This procedure can avoid H-D exchange during experiments. We added the above methods into the revised SI.

      (3) A publication by Jasnin et al. (10.1039/b923878f) on heparin sulfate shows a resolution effect.

      Reply 7: Based on the data from Jasnin et al. (10.1039/b923878f), we found that the dynamical transition of heparin sulfate did not exhibit a strong resolution effect. Estimating the dynamical transition of mean square displacement (MSD) for nanosecond motions in all heparan sulfate samples is challenging due to the absence of data on nanosecond motion of HS-dry.

      (4) The authors should discuss the impact of the chosen q-range on their findings (see Phys. Chem. Chem. Phys., 2012, 14, 4927-4934, where the authors see a huge effect!).

      Reply 8: Following the reviewer's suggestion, we calculated Ton of H-protein in D2O in the q-range from 0.45-0.9 Å⁻¹ and 1.1-1.75 Å⁻¹. The results are summarized in Table S2 and Table S3. As shown in Tables S2-3., the q-range does not alter the Ton of proteins. We added the above results into the revised SI.

      (5) The authors underline that the dynamical transition is intrinsic to the protein. However, Cupane et al. (ref 12) have shown that it can also be found in a mixture of amino acids without any protein backbone.

      Reply 9: Following the reviewer’s suggestion, we added the following discussion into the revised main text.

      “Unfreezing of the protein structural relaxation might facilitate these conformational jumps, turning on its functionality. However, as revealed by Ref (Journal of biological physics, 2010, 36, 291-297.), the denatured form of lysozyme also exhibits a dynamical transition, similar to that seen in its folded native form. Additionally, the dynamical transition also can be found in the mixture of amino acids (Physical Review Letters, 2012, 109, 128102.). Hence, one can argue that the activation of the structural relaxation of the biomolecule above the dynamical transition temperature is a necessary but insufficient condition for the protein to function, as the latter also requires the biomolecule assuming the correctly folded 3-dimensional structure.”

      (6) The authors say that they find similar dependences from MSD. They should explain that the MSD is inversely proportional to the summed intensities squared.

      Reply 10: Following the reviewer’s suggestion, we added the estimation of mean-squared atomic displacement (MSD) in the revised SI.

      “The mean-squared atomic displacement was estimated by performing Gaussian approximation, where . The values of q used for Gaussian fitting ranges from 0.45 to 0.9 Å (Biophys. J. 2006, 91, 2573.).”

      (7) A decoupling between water dynamics and membrane dynamics has already been discussed by K. Wood, G. Zaccai et al.

      Reply 11: Following the reviewer’s suggestion, we added the discussion in revised main text. “The results from the neutron scattering experiments suggest that the dynamical transition in proteins is an intrinsic property of the biomolecule and strongly depends on the amount of water surrounding it. Such an intrinsic transition can result either from a critical phase transition, e.g., water to ice (PNAS 2007, 104, 18049-18054.; JPCB, 1999, 103, 8036-8050), or from freezing of the structural relaxation of the system beyond the equilibrium time (~100-1000 s) of the experiment, in analogy to the glass transition in polymers from rubbery state to the glass form (Philosophical Magazine, 2004, 84, 1341-1353.; Science, 1995, 267, 1939-1945.; Colloid and Polymer Science, 1995, 273, 413-420.).”

      (8) The fact that transition temperature in lipid membranes is higher when the membrane is dry is also well known (A.V. Popova, D.K. Hincha, BMC Biophys. 4, 11 (2011)).

      Reply 12: We agree with the reviewer that transition temperature in lipid membranes is higher when the membrane is dry is well known. We cited this work as reference.

      (9) The authors should mention the slope (K/min) they used for DSC and discuss the impact of it on the results.

      Reply 13: Following the reviewer’s suggestion, we added DSC measurements in revised SI. “DSC measurements were performed by using the METTLER instruments DSC3+. The sample was sealed in a pan of aluminum. An empty pan was used as a reference. All the experiments were carried out in the temperature range from 150 to 300 K with a heating rate of 1 K/min. The heating rate of DSC is the same as neutron experiments.”

      (10) In the introduction, the authors should present the different explanations forwarded for the dynamical transition.

      Reply 14: Following the reviewer’s suggestion, we added different explanations forwarded for the dynamical transition in the Introduction in revised main text.

      “The dynamical transition of protein represents a significant change in the internal mobility of proteins, which has garnered various explanations. One theory suggests it's due to the behavior of water in the hydration shell, transitioning from rigid to fluid at certain temperatures, thus influencing protein flexibility. Another theory considers the transition as an inherent property of the protein, where thermal energy allows the protein to access a wider range of conformations. ”

      Reviewer #1 (Recommendations For The Authors):

      A major strength of the work is the parallel experiments performed on each of the 4 proteins. To allow better comparison of these it would be helpful to present these combined data in relevant figures to make a side-by-side comparison easier. A summary table of Ton (and potentially TDSC) values would also be helpful.

      Reply 15: Following the reviewer’s suggestion, we summarized the Ton of proteins in Table S5 and Table S6.

      The effect of hydration on protein structure should be considered. Alterations in protein secondary and tertiary structure would be expected to alter dynamics and thus could be seen as a change in Ton.

      Reply 16: The detailed analysis and discussion are presented in Reply 3.

      No uncertainty (error) in Ton values is presented. Could these be estimated from e.g. a comparison of protein Ton values measured under identical sample conditions with different spectrometers?

      Reply 17: It would be hard to compare Ton of proteins measured with different spectrometers because different spectrometers have different energy resolutions. For example, the energy resolutions of HFBS, DNA and OSIRIS are 1 μeV, 13 μeV, 25.4 μeV and 100 μeV, respectively.

      More detail is needed to correctly describe/define the proteins used for the study - e.g. P450 is a family of enzymes, so which one was used?

      Reply 18: We used P450 from Pseudomonas putida for the study. The PDB ID is 2ZAX. We added this information in the revised SI.

      P450 and myoglobin also have heme cofactors. Were these deuterated as part of the protein preparation?

      Reply 19: The heme cofactors were deuterated as part of the protein preparation.  For D-protein, all the cell culture for E.coli is deuterated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study identifies new types of interactions between Drosophila gustatory receptor neurons (GRNs) and shows that these interactions influence sensory responses and behavior. The authors find that HCN, a hyperpolarization-activated cation channel, suppresses the activity of GRNs in which it is expressed, preventing those GRNs from depleting the sensillum potential, and thereby promoting the activity of neighboring GRNs in the same sensilla. HCN is expressed in sugar GRNs, so HCN dampens the excitation of sugar GRNs and promotes the excitation of bitter GRNs. Impairing HCN expression in sugar GRNs depletes the sensillum potential and decreases bitter responses, especially when flies are fed on a sugar-rich diet, and this leads to decreased bitter aversion in a feeding assay. The authors' conclusions are supported by genetic manipulations, electrophysiological recordings, and behavioral assays.

      Strengths:

      (1) Non-synaptic interactions between neurons that share an extracellular environment (sometimes called "ephaptic" interactions) have not been well-studied, and certainly not in the insect taste system. A major strength of this study is the new insight it provides into how these interactions can impact sensory coding and behavior.

      We appreciate the reviewer’ view that our findings may allow researchers to better understand sensory coding and behavior. However, we respectfully disagree that the SP homeostasis in Drosophila gustation we describe here pertains to ephaptic interaction. Although SP reduction was proposed as the basis of post-ephaptic hyperpolarization in Drosophila olfaction, we find that SP changes are found to be too slow to mediate the fast action of ephaptic inhibition in gustation, reported in the ref#17. We observed a slow, sweet-dependent SP depletion (Fig. 5B, revised), which takes more than one hour. The real-time change of SP was also slow even upon contact with 200-mM sucrose; this result was set aside for another manuscript in preparation. Therefore, we believe the main findings in this paper concern the homeostatic preservation of SP for the maintenance of gustatory function, not ephaptic interaction.

      (2) The authors use many different types of genetic manipulations to dissect the role of HCN in GRN function, including mutants, RNAi, overexpression, ectopic expression, and neuronal silencing. Their results convincingly show that HCN impacts the sensillum potential and has both cell-autonomous and nonautonomous effects that go in opposite directions. There are a couple of conflicting or counterintuitive results, but the authors discuss potential explanations.

      (3) Experiments comparing flies raised on different food sources suggest an explanation for why the system may have evolved the way that it did: when flies live in a sugar-rich environment, their bitter sensitivity decreases, and HCN expression in sugar GRNs helps to counteract this decrease.

      Weaknesses/Limitations:

      (1) The genetic manipulations were constitutive (e.g. Ih mutations, RNAi, or misexpression), and depleting Ih from birth could lead to compensatory effects that change the function of the neurons or sensillum. Using tools to temporally control Ih expression could help to confirm the results of this study.

      We attempted to address this point by using the tub-Gal80ts system. The result is now included as Fig. 1-figure supplement 2. At 29C, a non-permissive temperature for GAL80ts which allows GAL4-dependent expression Ih-RNAi, we observed that bGRN responses were decreased and sGRN responses were increased compared to the control maintained at 18°C, and this is in parallel with the result in Fig. 1C,D. For this experiment, we inserted “To exclude the possibility that Ih is required for normal gustatory development, we temporally controlled Ih RNAi knockdown to occur only in adulthood, which produced similar results (Fig. 1-figure supplement 2).” (~line 113).

      (2) The behavioral experiment shows a striking loss of bitter sensitivity, but it was only conducted for one bitter compound at one concentration. It is not clear how general this effect is. The same is true for some of the bitter GRN electrophysiological experiments that only tested one compound and concentration.

      We conducted additional behavioral experiments with other bitters such as lobeline and theophylline (Fig. 5-figure supplement 1), which showed sensitivity losses in Ih mutants similar to caffeine. For these results, the following is inserted at ~line 274: “These results were recapitulated with other bitters, lobeline and theophylline (Fig. 5-figure supplement 1).”

      We also added single sensillum recording data with bitters, berberine, lobeline, theophylline and umbelliferone, which yielded results similar to those obtained with caffeine (Fig. 1-figure supplement 1). This is described with the sentence at ~line 105 “Other bitter chemical compounds, berberine, lobeline, theophylline, and umbelliferone, also required Ih for normal bGRN responses (Fig. 1-figure supplement 1).”

      (3) Several experiments using the Gal4/UAS system only show the Gal4/+ control and not the UAS/+ control (or occasionally neither control). Since some of the measurements in control flies seem to vary (e.g., spiking rate), it is important to compare the experimental flies to both controls to ensure that any observed effects are in fact due to the transgene expression.

      We appreciate the reviewers for raising this point. Indeed, there was a small logical flaw with the controls. We have now included all the necessary controls for Fig. 1C-F, Fig. 2I,J, Fig. 4E, and Fig. 5D, as reviewers suggested. These experiments remained statistically significant after including the new control groups.

      (4) I was surprised that manipulations of sugar GRNs (e.g. Ih knockdown, Gr64a-f deletion, or Kir silencing) can impact the sensillum potential and bitter GRN responses even in experiments where no sugar was presented.

      We are afraid there is a misunderstanding on the early part of the paper. We suspected that the manipulations impacted bGRNs and SP due to the sweetness in the regular cornmeal food, as stated in lines 214-220 “Typically, we performed extracellular recordings on flies 4-5 days after eclosion, during which they were kept in a vial with fresh regular cornmeal food containing ~400 mM D-glucose. The presence of sweetness in the food would impose long-term stimulation of sGRNs, potentially requiring the delimitation of sGRN excitability for the homeostatic maintenance of gustatory functions. To investigate this possibility, we fed WT and Ihf03355 flies overnight with either non-sweet sorbitol alone (200 mM) or a sweet mixture of sorbitol (200 mM) + sucrose (100 mM).”

      I believe the authors are suggesting that the effects of sugar GRN activity (e.g., from consuming sugar in the fly food prior to the experiment) can have long-lasting effects, but it wasn't entirely clear if this is their primary explanation or on what timescale those long-lasting effects would occur. How much / how long of a sugar exposure do the flies need for these effects to be triggered, and how long do those effects last once sugar is removed?

      We attempted to address this point with additional experiments (Fig. 5A,B). The reduction of SP could be observed in WT and HCN-deficient mutants with similar degrees 1 hr after the flies were transferred from nonsweet sorbitol-containing vials to sweet sucrose-containing ones. Moreover, the mutants, but not WT, showed further depression of SP when the sweetness persisted in the media for 4 hrs and overnight. This long-term exposure to sweetness longer than 1 hr may simulates the feeding on the regular sweet cornmeal food. The recovery of SP was also tested by removing flies from the sweet media after overnight-long sweet exposure and placing them in sorbitol food. SPs of WT and the mutants were recovered to the similar levels 1 hr after separating the animals from sweetness, although the HCN-lacking mutants showed much lower SP right after overnight sweetness exposure. The unimpaired recovery of the mutants suggests that HCN is independent of generating transepithelial potential itself. Therefore, regardless of HCN, SP changes are not fast even in the presence of strong sweetness, and SP is much better guarded when sGRNs express HCN in a sweet environment.

      We inserted the following at ~line 260 to describe the newly added recovery experiment: “Following overnight sweet exposure, SPs of WT and Ihf03355 were recovered to similar levels after 1-hr incubation with sorbitol only food. However, it was after 4 hrs on the sorbitol food that the two lines exhibited SP levels similar to those achieved by overnight incubation with sorbitol only food (Fig. 5B). These results indicate that SP depletion by sweetness is a slow process, and that the dysregulated reduction and recovery of SPs in Ihf03355 manifest only after long-term conditioning with and without sweetness, respectively.”.

      (5) The authors mention that HCN may impact the resting potential in addition to changing the excitability of the cell through various mechanisms. It would be informative to record the resting potential and other neuronal properties, but this is very difficult for GRNs, so the current study is not able to determine exactly how HCN affects GRN activity.

      On this point, we cannot but rely on previous studies of biophysical and electrophysiological characterization on mammalian HCN channels and a heterologous expression study that revealed a robust hyperpolarization-activated cation current from Drosophila HCN channels (PMID: 15804582).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors start by showing that HCN loss-of-function mutation causes a decrease in spiking in bitter GRNs (bGRN) while leaving sweet GRN (sGRN) response in the same sensillum intact. They show that a perturbation of HCN channels in sweet-sensing neurons causes a similar decrease while increasing the response of sugar neurons. They were also able to rescue the response by exogenous expression. Ectopic expression of HCN in bitter neurons had no effect. Next, they measure the sensillum potential and find that sensillum potential is also affected by HCN channel perturbation. These findings lead them to speculate that HCN in sGRN increases sGRN spiking which in turn affects bGRNs. To test this idea that carried out multiple perturbations aimed at decreasing sGRN activity. They found that decreasing sGRN activity by either using receptor mutant or by expressing Kir (a K+ channel) in sGRN increased bGRN responses. These responses also increase the sensillum potential. Finally, they show that these changes are behaviorally relevant as conditions that increase sGRN activity decrease avoidance of bitter substances.

      Strengths:

      There is solid evidence that perturbation of sweet GRNs affects bitter GRN in the same sensillum. The measurement of transsynaptic potential and how it changes is also interesting and supports the authors' conclusion.

      Weaknesses:

      The ionic basis of how perturbation in GRN affects the transepithelial potential which in turn affects the second neuron is not clear.

      We speculate that HCN-dependent membrane potential regulation, rather than ionic composition change, is responsible for the observed SP preservation, as further discussed as an author response in the section of “Recommendations for the authors”. The transepithelial potential can be dissipated by increased conductance through receptor-linked ion channels following gustatory receptor activation in GRNs. The volume of the sensillum lymph is very small according to electron micrographs of horizontally sliced bristles (PMID: 11456419). Therefore, robust excitation of a gustatory neuron may easily deplete the extracellular potential built as a form of polarized ion concentrations across the tight junction. When the consumption is too strong and extended, the neighboring neuron, which share TEP with the activated GRN, can be negatively affected. We propose that HCN suppresses overexcitation of sGRNs by means of membrane potential stabilization. This stabilization prevents sGRNs from excessively reducing the TEP, thereby protecting the activity of neighboring bGRNs.

      Reviewer #3 (Public Review):

      Ephaptic inhibition between neurons housed in the same sensilla has been long discovered in flies, but the molecular basis underlying this inhibition is underexplored. Specifically, it remains poorly understood which receptors or channels are important for maintaining the transepithelial potential between the sensillum lymph and the hemolymph (known as the sensillum potential), and how this affects the excitability of neurons housed in the same sensilla.

      Although a reduction of sensillum potential was proposed to underlie membrane hyperpolarization of post-ephaptic olfactory neurons in Drosophila, our preliminary data (not shown due to a manuscript in preparation) and the results included in the paper (Fig. 5B) strongly suggest that SP reduction is not a requisite for ephaptic inhibition at least in GRNs. Ephaptic inhibition is expected to be instantaneous, whereas we find that SP reduction in gustation is very slow. Therefore, we would like to indicate that the findings we report in this manuscript are not directly related to ephaptic inhibition.

      Lee et al. used single-sensillum recordings (SSR) of the labellar taste sensilla to demonstrate that the HCN channel, Ih, is critical for maintaining sensillum potential in flies. Ih is expressed in sugar-sensing GRNs (sGRNs) but affects the excitability of both the sGRNs and the bitter-sensing GRNs (bGRNs) in the same sensilla. Ih mutant flies have decreased sensillum potential, and bGRNs of Ih mutant flies have a decreased response to the bitter compound caffeine. Interestingly, ectopic expression of Ih in bGRNs also increases sGRN response to sucrose, suggesting that Ih-dependent increase in sensillum potential is not specific to Ih expressed in sGRNs. The authors further demonstrated, using both SSR and behavior assays, that exposure to sugars in the food substrate is important for the Ih-dependent sensitization of bGRNs. The experiments conducted in this paper are of interest to the chemosensory field. The observation that Ih is important for the activity in bGRNs albeit expressed in sGRNs is especially fascinating and highlights the importance of non-synaptic interactions in the taste system.

      Despite the interesting results, this paper is not written in a clear and easily understandable manner. It uses poorly defined terms without much elaboration, contains sentences that are borderline unreadable even for those in the narrower chemosensory field, and many figures can clearly benefit from more labeling and explanation. It certainly needs a bit of work.

      We would like to revise the language aspect of the manuscript after finalizing the scientific revision.

      Below are the major points:

      (1) Throughout the paper, it is assumed that Ih channels are expressed in sugar-sensing GRNs but not bitter-sensing GRNs. However, both this paper and citation #17, another paper from the same lab, contain only circumstantial evidence for the expression of Ih channels in sGRNs. A simple co-expression analysis, using the Ih-T2A-GAL4 line and Gr5a-LexA/Gr66a-LexA line, all of which are available, could easily demonstrate the co-expression. Including such a figure would significantly strengthen the conclusion of this paper.

      We did conduct confocal imaging with Ih-T2A-Gal4 in combination with GRN Gal4s (ref#17 version2). The expression is very broad, including both neurons and non-neuronal cells. We observed much stronger sGRN expression than bGRN expression. But the promiscuous expression of the reporter in many cells hindered us from clearly demonstrating the void of the reporter in bGRNs. However, the functional and physiological examination of Ih-T2A-Gal4 with the neuronal modifiers such as TRPA1 and Kir2.1 in ref#17 indicates the strong and little expression of Ih in sGRNs and bGRNs, respectively. Furthermore, the RNAi kd results present another line of evidence that HCN expressed in sGRNs regulates SP and bGRN activity (Fig. 1C,D, Fig. 1-figure supplement 2). Ih-RNAi expression in bGRNs did not result in any statistically significant changes in the activities of sGRNs and bGRNs compared to controls (Fig. 1C,D, revised), advocating that Ih acts in sGRNs for the functional homeostasis of SP and GRNs, as we claim.

      (2) Throughout this paper, it is often unclear which class of labellar taste sensilla is being recorded. S-a, S-b, I-a, and I-b sensilla all have different sensitivities to bitters and sugars. Each figure should clearly indicate which sensilla is being recorded. Justification should be provided if recordings from different classes of sensilla are being pooled together for statistics.

      We mainly performed SSR (single sensillum recording) on i-type bristles as they have the simplest composition of GRNs compared to s- and L-type bristles. As single s-types also contain each of s- and bGRN, we measured SP also for s-types (Figs. 2, 3F and 4D). In case of Fig.3-figure supplement 1, L-types were tested for the relationship between water cell activity and SP. Now all the panels are labelled with the tested bristle types.

      (3) In many figures, there is a lack of critical control experiments. Examples include Figures 1C-F (lacking UAS control), Figure 2I-J (lacking UAS control), Figure 4E (lacking the UAS and GAL4 control, and it is also strange to compare Gr64f > RNAi with Gr66a > RNAi, instead of with parental GAL4 and UAS controls.), and Figure 5D (lacking UAS control). Without these critical control experiments, it is difficult to evaluate the quality of the work.

      Thank you for pointing this out. We appreciate the feedback and have addressed these concerns by including all the requested controls in the figures. Specifically, we have added the UAS controls for Figs 1C-F and 2I-J, as well as the UAS and GAL4 controls for Fig. 4E. We have also included the UAS control for Fig. 5D.

      (4) Figure 2A could benefit from more clarification about what exactly is being recorded here. The text is confusing: a considerable amount of text is spent on explaining the technical details of how SP is recorded, but very little text about what SP represents, which is critical for the readers. The authors should clarify in the text that SP is measuring the potential between the sensillar lymph, where the dendrites of GRNs are immersed, and the hemolymph. Adding a schematic figure to show that SP represents the potential between the sensillar lymph and hemolymph would be beneficial.

      SP was defined at lines 55-56 in the first paragraph of introduction, which also contains the background information for SP as a transepithelial potential. As reviewer suggested, we now also included a sentence describing SP (“SP is known as a transepithelial potential between the sensillum lymph and the hemolymph, generated by active ion transport through support cells”, line 126) and a drawing to illustrate the concept of SP (Fig. 2A), and revised the legend.

      (5) The sGRN spiking rate in Figure 4B deviates significantly from previous literature (Wang, Carlson, eLife 2022; Jiao, Montell PNAS 2007, as examples), and the response to sucrose in the control flies is not dosage-dependent, which raises questions about the quality of the data. Why are the responses to sucrose not dosage-dependent? The responses are clearly not saturated at these (10 mM to 100 mM) concentrations.

      Our recordings show different spiking frequencies from others’ work, because the frequencies are from 5-sec bins not only first 0.5 sec. This lowers the frequencies, as spikes are relatively more frequent in the beginning of the recording (Fig. 4-figure supplement 1).

      Why are the responses to sucrose not dosage-dependent? The responses are clearly not saturated at these (10 mM to 100 mM) concentrations.

      We were also puzzled with the flat dose dependence to sucrose. This result may suggest the existence of another mechanism moderating sucrose responses of sGRNs. This flat curve reappeared with other genotypes with the same concentration range (5-50 mM) in Fig. 4E. However, 1-mM sucrose produced much lower spiking frequencies (Fig. 4E), suggesting that sGRN responses are saturated at 5 mM sucrose with our recording/analysis condition.

      (6) In Figure 4C, instead of showing the average spike rate of the first five seconds and the next 5 seconds, why not show a peristimulus time histogram? It would help the readers tremendously, and it would also show how quickly the spike rate adapts to overexpression and control flies. Also, since taste responses adapt rather quickly, a 500 ms or 1 s bin would be more appropriate than a 5-second bin.

      Taste single sensillum recording starts by contacting stimulants, which bars us from recording pre-stimulus responses of GRNs. Therefore, we showed post-stimulus graphs with 1-sec bins (Fig. 4-figure supplement 1) as we reviewer suggested.

      (7) Lines 215 - 220. The authors state that the presence of sugars in the culture media would expose the GRNs to sugar constantly, without providing much evidence. What is the evidence that the GRNs are being activated constantly in flies raised with culture media containing sugars? The sensilla are not always in contact with the food.

      We agree with reviewer. We replaced “long-term stimulation of sGRNs” with “strong and frequent stimulation of sGRNs for extended period”. The word long-term may be interpreted to be constant.

      (8) Line 223. To show that bGRN spike rates in Ih mutant flies "decreased even more than WT", you need to compare the difference in spike rates between the sorbitol group and the sorbitol + sucrose group, which is not what is currently shown.

      The data were examined by ANOVA and a multiple comparison test (Dunn’s) between all the groups regardless of genotypes and conditions in the panel (all the groups sharing the y axis). Therefore, the differences were statistically examined. However, the cited expression we used read like it was about the slope or extent of the decrease. We intended to indicate the difference in the absolute values of spiking frequencies after overnight sweet exposure between the genotypes, while bGRN activities were statistically indifferent between WT and Ih mutants when they were kept only on sorbitol food. We revised it to “decreased to the level significantly lower than WT”. We also changed the graph style to effectively present the trend of changes in bGRN sensitivity with comparison between genotypes. Again, the groups were statistically examined together regardless of the genotypes and conditions.

      (9) To help readers better understand the proposed mechanisms here, including a schematic figure would be helpful. This should show where Ih is expressed, how Ih in sGRNs impacts the sensillum potential, how elevated sensillum potential increases the electrical driving force for the receptor current, and affects the excitability of the bGRNs in the same sensilla, and how exposure to sugar is proposed to affect ion homeostasis in the sensillum lymph.

      As reviewer suggested, we included two panels to show working model for gustatory homeostasis via SP maintenance by HCN (Fig. 5E,F).

      Reviewer #1 (Recommendations For The Authors):

      (1) The relationship between this paper and the authors' bioRxiv preprint posted last year is not clear. In the introduction they made it seem like this paper is a follow-up that builds on the preprint, but most or all of the experiments in this paper were already performed in the preprint. I guess the authors are planning to divide the original paper into two papers. I would suggest updating the preprint to avoid confusion.

      Thank you for the comment. We updated the preprint to be without a part of Fig.6 and entire Fig.7 along with associated texts. As reviewer pointed out, our eLife paper was spun off from the part of the preprint paper, because we feel that the two stories could confuse readers when presented together.

      (2) Have the authors considered testing responses of water GRNs? They reside in the same sensilla as sugar neurons, so are they also increased affected by Ih mutation or RNAi in sugar neurons? This would strengthen the evidence that the indirect (non-cell autonomous) effects of Ih are due to the sensillum potential and not some specific interaction between sweet and bitter cells.

      As reviewer proposed, we appraised water GRN activity in the L-type bristles of WT, Ihf03355 and a genomic rescue line for Ihf03355. Spiking responses in water GRNs were evoked by hypo-osmolarity of electrolyte (0.1 mM tricholine citrate-TCC). Interestingly, the Ih mutant showed reduced 0.1 mM TCC-provoked spiking frequencies compared to WT. This impairment was rescued by the genomic fragment containing an intact Ih locus (Figure 3-figure supplement 1A).

      Additionally, SPs in L-type bristles were reduced by Ih deficiencies but increased in Gr64af, suggesting that HCN regulates sGRNs in L-type bristles as well (Figure 3-figure supplement 1B). Again, the bristles of animals with both mutations together exhibited SPs similar to those of WT.

      Furthermore, when we conducted cDNA rescue experiments in L bristles, introduction of Ih-RF cDNA in sGRNs restored SPs, while expressing it in bGRNs did not unlike the results from the i- and s-bristles (Fig. 2K,L), likely because L-bristles lack bGRNs. These cDNA rescue and genetic interaction experiments were conducted using flies fed on fresh cornmeal food with strong sweetness, suggesting that the sweetness in the media is the likely key factor producing the genetic interaction and necessitating HCN, consistent with other results in the manuscript. Therefore, SP regulation by HCN is observed in the L-type bristles.

      Minor comments:

      Line 52: typo, "Many of"

      Thank you. Corrected

      Line 95: typo, "sensilla do an sGRN"

      Corrected

      Line 98: typo, "we observed reduced the spiking responses"

      Corrected

      Line 206: typo, "a relatively low sucrose concentrations"

      Corrected

      Line 260: "inverse relationship between the two GRNs in excitability" - I am not exactly sure what data you are referring to.

      Although alleles did not show increased sGRN activities, knockdown of Ih decreased bGRN activity but increased sGRN activity (Fig. 1C,D, Fig.1-figure supplement 2B), while suppression of sGRNs increased bGRN activity (Fig. 3). To clarify this point, we revised the phrase to “the inverse relationship between the two GRNs in excitability observed in Fig. 1C,D, Fig. 1-figure supplement 2B, and Fig. 3”.

      Methods: typo, "twenty of 3-5 days with 10 males and 10 females"

      Corrected to “Twenty flies, aged 3-5 days and consisting of 10 males and 10 females,”

      Methods: typo, "Kim's wipes" should be "Kimwipes"

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      (1) More clarification is necessary on Transepithelial potential (TEP). TEP is typically created by having pumps and tight junctions between the sensillar lymph and the hemolymph.

      We have an introduction to TEP or SP in the context of sensory functions (lines 40-57) with relevant references. The involvement of pumps and tight junction was mentioned in the same paragraph; “Glia-like support cells exhibit close physical association with sensory receptor neurons, and conduct active transcellular ion transport, which is important for the operation of sensory systems” (line 40) and “Tight junctions between support cells separate the externally facing sensillar lymph from the internal body fluid known as hemolymph” (line 53).

      It is not clear how HCN channels in one of the neurons might change the composition of the sensillum lymph. An explanation of their model of how TEP depends on HCN is necessary.

      Although the ionic composition of the sensillum lymph is a contributing factor to the sensillum potential, it is more conceptually relevant to describe our findings with the perspective of membrane potential regulation given the role of HCN in membrane potential stabilization as discussed in our manuscript.

      We speculate that HCN controls the membrane potential at rest and/or in motion to modulate sGRN activity towards saving SP despite the sweetness in the niche. We positioned our results in relation to SP in discussion; “Our results provide multiple lines of evidence that HCN suppresses HCN-expressing GRNs, thereby sustaining the activity of neighboring GRNs within the same sensilla. We propose that this modulation occurs by restricting SP consumption through HCN-dependent neuronal suppression rather than via chemical and electrical synaptic transmission.” (lines 252-255). Moreover, it is unclear whether HCN is localized to the dendrite bathed in the sensillum lymph to influence the ionic composition of the lymph. It would be very interesting to study in future whether the ionic flow through HCN channels itself is critical for the function of HCN in this context, and whether HCN is exclusively present in the dendrite to support the postulation. However, we would like to remind reviewer that Kir2.1 and HCN channels in sGRNs showed similar effects on SP and bGRNs, while they differ in Na+ conductance.

      In the initially submitted manuscript (lines 325-343), we discussed the potential mechanism by which Kir2.1 and HCN channels commonly increase SP in terms of how the membrane potential regulation in the soma can control the SP consumption in the dendrite of sGRNs.

      Another point about the TEP that needs some explanation is that these sensilla are open to the environment as tastants must flow in and are different from mechanical sensilla in that sense.

      This is a very important question regarding the general physiology of the taste sensilla, as the sensillum lymph is in contact with the external environment through the pore of the sensillum. It is indeed interesting to consider how the composition and potential of the lymph are maintained despite the relatively vast volume of food the sensilla encounter during gustation and the continuous evaporation to air between episodes of gustation. However, we believe that this question, while important, is distinct from the primary focus of our manuscript.

      Are the TEP measurements in Figure 2 under control conditions where there are no tastants?

      There is no tastant in the SP-measuring glass electrode other than the electrolyte. We apologize that we did not specify the recording electrode condition. We inserted a clause in the method; “For SP recordings, the recording electrode contained 2 mM TCC as the electrolyte, and…”

      Does the TEP change dynamically as sGRN is activated?

      SP does shift in response to sweets. Please see Fig. 5B. Also, we showed SP changes by mechanical stimuli, which depended on the mechanoreceptor, NompC (Fig. 2D-F). Mechanoreceptor neurons share the sensillum lymph with GRNs.

      (2) More clarification on the potential transduction mechanism and how TEP affects one neuron differentially. Essentially, sGRN perturbation affects sGRN activity and it affects the TEP. More explanation is needed for the potential ionic mechanism of each.

      Our results strongly suggest that HCN lowers the activity of HCN-expressing GRNs, mitigating SP consumption. This modulation is crucial because the SP serves as a driving force for neuronal activation within the sensillum. HCN is particularly necessary in sGRNs because of the flies’ sweet feeding niche, which is expected to result in frequent and strong activation of sGRNs. The SP saved by HCN-dependent delimitation of sGRNs can be used to raise the responsibility of bGRNs.

      (3) The authors refer to their own unreviewed paper (Reference 17). This paper is on a similar topic and there seems to be some overlap. Clarification on this point would be important.

      We revised the biorxiv preprint, so that the preprint version 2 does not contain the parts overlapping with this eLife paper. This eLife paper was originally part of the preprint paper, but it was separated to clarify the messages of the two stories. As we explained in Discussion (lines 276-297), HCN provides resistance to both hyperpolarization and depolarization of the membrane potential. Simply put, one paper focuses on the role of HCN in resisting hyperpolarization, while the other (this paper in eLife) focuses on resisting depolarization.

      (4) Methods are sparse. Many details on the method are necessary. For example, Sensilla recordings are being done by the tip-dip method (I assume). What does "number of experiments" mean in Figure 1? Is it the number of animals or the number of sensilla? How many trials/sensilla?

      We indicated the extracellular recording was performed by the tip-dip method; “In vivo extracellular recordings were performed by the tip-dip method as detailed previously”. We also added a statement on the number of experiments; “The number of experiments indicated in figures are the number of naïve bristles tested. The naïve bristles were from at least three different animals.”

      (5) Figure 1: I understand the author's interpretation. But if one compares WT in Figure 1A to Gr64a-IhRNAi in 1C, we can come to the conclusion that there is no change. In other words, the control in Figure 1C (grey) has a much higher response than WT. Similar conclusions can be made for other experiments. Is the WT response stable enough to make the conclusions made here?

      The genetic background of each genotype may influence GRN activity to some extent. RNAi knockdown experiments are well-known for their hypomorphic nature, and their effects should be evaluated by comparison with their parental controls such as Gal4 and UAS lines. As all reviewers pointed out, we added the results from UAS control. This effort confirms that Gr89a>Ih RNAi is statistically indifferent to UAS control as well as Gr64f-Gal4 control in bGRN spiking evoked by 2-mM caffeine, while Gr64f>Ih RNAi showed reduced bGRN responses to 2 mM caffeine compared to all the controls.

      (6) Figure 3: Why is bGRN spiking not plotted against sensillum potential to observe the dependence more directly?

      This is a very interesting suggestion. We are not, however, equipped to measure spiking and sensillum potential simultaneously. Therefore, they are independent experiments, and we treated them accordingly.

      (7) Figure 4: Why bGRN response is only affected at high caffeine concentrations is not clear.

      We were also surprised by the differences in the dose dependence results of b- and sGRNs, genetically manipulated to mis-express and over-express HCN in Fig. 4A and 4E, respectively. Each gustatory neuron likely has distinct sets of players and parameters that set its own membrane potential and excitability.

      We can think of a possibility that there might be a range of membrane potentials within which HCN does not engage. In bGRNs, the resting membrane potential may lie low within this range, so that some degrees of membrane depolarization by low concentrations of caffeine do not significantly close HCN channels, thus preventing their hyperpolarizing effects. On the other hand, the membrane potential of sGRNs may be high within this range, showing suppressive effects at all tested sucrose concentrations. However, we find this explanation is too speculative to include in the main text, while we stated in the original manuscript, “implying a complex cell-specific regulation of GRN excitability.” (line 210).

      (8) Minor:

      L98 - there is a small typo

      Corrected

      L274: "funny" !?

      “Funny” currents, denoted If, were initially observed by electrophysiologists and later attributed to HCN channels, now indicated by Ih (thus the gene name Ih in Drosophila). These currents were termed "funny" due to their unusual properties compared to other currents. For more detailed information, please refer to the cited references.

      L257: Neuropeptide seemed to be abrupt

      We attempted to discuss possible mechanisms that mediate excitability changes across GRNs beyond the mechanism by SP shifts. Neuropeptides, which are chemical neurotransmitters along with small neurotransmitters, were mentioned following the discussion on synaptic transmission to suggest alternative pathways for excitability regulation. This inclusion is meant to provide a comprehensive overview of potential mechanisms influencing GRN activity.

      Reviewer #3 (Recommendations For The Authors):

      Congratulations on your fascinating research! The results are certainly of interest to the chemosensory field. However, I suggest using academic editing services to enhance the clarity of your text and ensure that the terminology and jargon align with standard usage in the field. The current choice of words may not be consistent with commonly used terms. As it is now, the writing might not fully showcase the compelling story and the effort behind your study, and is underselling your interesting results. Proper refinement could make sure your valuable findings are appropriately recognized.

      We appreciate your comments and apologize for any difficulties reviewers faced during the review process. We are currently prioritizing the review of scientific content and plan to address language issues in a subsequent revision. It would be very helpful for future revisions if the problematic sentences or expressions could be indicated in detail after this revision. This will allow us to ensure that our terminology and expression align with standard usage in the field, and that our findings are clearly and effectively communicated.

      Minor points:

      (1) Line 110: what is Ih-RF?

      We apologize that we relied on a reference in describing the cDNA. The following clause was inserted with additional reference and the Flybase id: “(Flybase id: FBtr0290109), which previously rescued Ih deficiency in other contexts17,26 ,”  

      (2) Line 158: Gr64af mutant flies still have Gr5a and a residual response to fructose and sucrose (Slone, Amrein 2007).

      We revised the line to “is severely impaired in sucrose and glucose sensing”, since there is a substantial loss of sucrose and glucose sensing in both Gr64af from Kim et al 2018 and DGr64 from Slone et al 2007, when they were examined by the proboscis extension reflex assay. This was also confirmed in the study by Jiao et al 2009. We also deleted “sugar-ageusic” and instead describe the mutant “impaired in sucrose and glucose sensing” in Fig. 3 legend.

      (3) Lines 264-273 seem unnecessary. This paper is not about the function of HCN in mammals, and these discussions seem largely irrelevant.

      We feel that it is important to position our results within a broader context by discussing the potential implications of our findings for sensory systems of other animals. As we stated, HCN channels have been localized in mammalian sensory systems, but their roles are often not well understood. By including this discussion, we aim to highlight the relevance of our findings beyond the model organism used in our study and suggest possible areas for future research in mammalian systems.

    1. Author response:

      We would like to 1) response one comment from the public review, which is also related to the eLife assessment, and 2) give provisional author responses.

      (1) Regarding the definition of the colonization-extinction rate, the first reviewer may misunderstand it: “However, there does not need to be a temporal trend! Any warm-adapted species that colonizes a site has a positive net effect on CTI; similarly, any cold-adapted species that goes extinct contributes to thermophilization.” We here clarify the definition:

      In a single iteration of our MSOM (Multi-species occupancy model), the occupancy rate of species[n] in transect[i] from year[t-1] to year[t] is related to the colonization rate and extinction rate, and is defined as:<br /> muz[n,i,t] = z[n,i,t-1]*(1-eps[n,i,t-1]) + (1-z[n,i,t-1])*gam[n,i,t-1], (also shown in Line411 in our MS).

      If the colonization rate (gam) and extinction rate (eps) remain constant, the occupancy rate(muz) will be a constant number which is related to the state of real occupancy (0 or 1). The occupancy rate will only increase if colonization rate increases (or the extinction rate decreases). That is why we are considering the temporal trend in colonization/extinction rate.

      (2) Provisional author responses:

      We will revise and improve the manuscript according to the public reviews and mainly focus on:

      (1) clarify the general definition of habitat fragmentation in the Introduction.

      (2) provide a wider perspective about how our results can be applied to conservation biology in the Discussion.

      (3) discuss the diversity of isolation metrics for future research and provide more evidence about the link between larger areas and higher habitat diversity or heterogeneity.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The authors isolated and cultured pulmonary artery smooth muscle cells (PASMC) and pulmonary artery adventitial fibroblasts (PAAF) of the lung samples derived from the patients with idiopathic pulmonary arterial hypertension (PAH) and the healthy volunteers. They performed RNA-seq and proteomics analyses to detail the cellular communication between PASMC and PAAF, which are the main target cells of pulmonary vascular remodeling during the pathogenesis of PAH. The authors revealed that PASMC and PAAF retained their original cellular identity and acquired different states associated with the pathogenesis of PAH, respectively.

      Strengths:

      Although previous studies have shown that PASMC and PAAF cells each have an important role in the pathogenesis of PAH, there have been scarce reports focusing on the interactions between PASMC and PAAF. These findings may provide valuable information for elucidating the pathogenesis of pulmonary arterial hypertension.

      We appreciate the reviewer’s positive view of our study.

      Weaknesses:

      The results of proteome analysis using primary culture cells in this paper seem a bit insufficient to draw conclusions. In particular, the authors described "We elucidated the involvement of cellular crosstalk in regulating cell state dynamics and identified pentraxin-3 and hepatocyte growth factor as modulators of PASMC phenotypic transition orchestrated by PAAF." However, the presented data are considered limited and insufficient.

      We thank the reviewer for drawing our attention to this point and we will modify our statements and conclusions accordingly, in order to avoid making too general and broad claims.

      Reviewer #2 (Public Review):

      Summary:

      Utilizing a combination of transcriptomic and proteomic profiling as well as cellular phenotyping from source-matched PASMC and PAAFs in IPAH, this study sought to explore a molecular comparison of these cells in order to track distinct cell fate trajectories and acquisition of their IPAH-associated cellular states. The authors also aimed to identify cell-cell communication axes in order to infer mechanisms by which these two cells interact and depend upon external cues. This study will be of interest to the scientific and clinical communities of those interested in pulmonary vascular biology and disease. It also will appeal to those interested in lung and vascular development as well as multi-omic analytic procedures.

      We thank the reviewer forvery positive assessment of our study.

      Strengths:

      (1) This is one of the first studies using orthogonal sequencing and phenotyping for the characterization of source-matched neighboring mesenchymal PASMC and PAAF cells in healthy and diseased IPAH patients. This is a major strength that allows for direct comparison of neighboring cell types and the ability to address an unanswered question regarding the nature of these mesenchymal "mural" cells at a precise molecular level.

      We value the reviewer’s kind and objective summary of our study.

      (2) Unlike a number of multi-omic sequencing papers that read more as an atlas of findings without structure, the inherent comparative organization of the study and presentation of the data were valuable in aiding the reader in understanding how to discern the distinct IPAH-associated cell states. As a result, the reader not only gleans greater insight into these two interacting cell types in disease but also now can leverage these datasets more easily for future research questions in this space.

      We thank the reviewer for this highly positive comment.

      (3) There are interesting and surprising findings in the cellular characterizations, including the low proliferative state of IPAH-PASMCs as compared to the hyperproliferative state in IPAH-PAAFs. Furthermore, the cell-cell communication axes involving ECM components and soluble ligands provided by PAAFs that direct cell state dynamics of PASMCs offer some of the first and foundational descriptions of what are likely complex cellular interactions that await discovery.

      We agree with the reviewer’s assessment that some of the novel data in our study helps to formulate testable hypothesis that can be followed up in future research.

      (4) Technical rigor is quite high in the -omics methodology and in vitro phenotyping tools used.

      We are grateful for reviewer’s recognition and positive assessment of our work.

      Weaknesses:

      There are some weaknesses in the methodology that should temper the conclusions:

      (1) The number of donors sampled for PAAF/PASMCs was small for both healthy controls and IPAH patients. Thus, while the level of detail of -omics profiling was quite deep, the generalizability of their findings to all IPAH patients or Group 1 PAH patients is limited.

      We share the reviewers concerns regarding the generalizability of the findings. Indeed, the initial number of samples used for the omics study (n=4 in each group) was limited due to the unique setup of using source-matched cells from the same pulmonary artery. While we included additional samples in our phenotypic assays (n=6) which further confirmed our findings,  we will acknowledge the small number of samples in the revised manuscript as a limiting factor in drawing definite conclusions for all PAH patients.

      (2) While the study utilized early passage cells, these cells nonetheless were still cultured outside the in vivo milieu prior to analysis. Thus, while there is an assumption that these cells do not change fundamental behavior outside the body, that is not entirely proven for all transcriptional and proteomic signatures. As such, the major alterations that are noted would be more compelling if validated from tissue or cells derived directly from in vivo sources. Without such validation, the major limitation of the impact and conclusions of the paper is that the full extent of the relevance of these findings to human disease is not known.

      We thank the reviewer for this constructive and excellent suggestion. Changes induced by ex vivo culturing are a common challenge when working with primary human cells. We agree with the reviewer that the proposed comparison with the publicly available sequencing datasets utilizing fresh samples will provide the readers with sufficient information to more objectively put the findings of our study into perspective.

      (3) While the presentation of most of the manuscript was quite clear and convincing, the terminology and conclusions regarding "cell fate trajectories" throughout the manuscript did not seem to be fully justified. That is, all of the analyses were derived from cells originating from end-stage IPAH, and otherwise, the authors were not lineage tracing across disease initiation or development (which would be impossible currently in humans). So, while the description of distinct "IPAH-associated states" makes sense, any true cell fate trajectory was not clearly defined.

      In accordance with reviewer’s comment, we will more carefully choose the wording in order to better reflect our findings.

    1. Author response:

      Reviewer #1 (Public Review):

      Weaknesses:

      With the exception of the PCR analysis and the reporter assays, the manuscript does not contain any experiments or attempts to analyze current expression from any of the identified proviruses. No long-read RNASeq or other RNA analysis on cytoplasmic RNA was performed, nor any experiments to show that proteins are indeed expressed.

      We agree that an investigation of RNA and protein expression from these proviruses would be very interesting, and we hope to do such work in the future to test whether this clade is still actively infecting any primate species. However, we believe that such an investigation is out of the scope of this manuscript, which is focused on the past evolutionary history of these viruses. However, it is worth noting that we do show evidence for proviral expression at the RNA level in Fig. 6 supplement 1, showing alignment of publically available rhesus macaque iPSC RNAseq data to the SERV-K1 provirus, including both spliced and full length viral RNA. Interestingly, there appear to be reads derived from multiple proviruses, as some reads originate from proviruses with large internal deletions, while others derive from full length proviruses.

      The findings of a potential CTE are interesting, but the sequences that were appended to the reporter construct are much longer than previously identified CTEs. No data were presented to indicate whether this sequence show similarity to previously identified CTEs and no experiments to show whether this sequence functionally interacts with Nxf1, the protein shown to interact with previously identified bona fide CTEs. Also, since nucleo-cytoplasmic export was not directly analyzed, it remains possible that the sequences that were inserted into the reporter contained splice sites that would allow the RNA to be spliced "downstream" of the GFP gene, allowing the export of a "spliced" GFP mRNA.

      While it is true that the HML8-derived sequences we have tested are much longer than the canonical MPMV CTE and many other known CTEs, there are other reports of elements with CTE-like activity that are much longer and more complex than the MPMV CTE, including one, the MLV PTE, which is ~1400 nt long, even longer than the HML8-derived sequence we have identified. We have compared the MER11 sequence to known CTEs from MPMV, IAP, MusD, MLV, and RSV, as well as the woodchuck hepatitis virus WPRE, which is not a canonical CTE but has been shown to promote nuclear export of RNA; none of these sequences showed any clear sequence similarity to our sequences of interest. We have added a section discussing these questions in some detail (l. 535-547).

      Although the question of what pathway or pathways these elements co-opt is obviously of great interest, we believe it is outside the scope of this manuscript. It is worth noting that a number of cis-acting RNA transport elements do not bind NXF1, either indirectly recruiting NXF1 (IAP RTE), using CRM1 (MLV, WPRE, foamy viruses), or have an unknown mechanism (MusD). We agree that there are potential pitfalls of the reporter system used, and thus have added experiments to directly test the CTE activity of these elements, detailed above.

    1. Author response:

      Reviewer #1 (Public Review):

      This manuscript by Negi et al. investigates the effects of different ubiquitin and ubiquitin-like modifications on the stability of substrate proteins, seeking to provide mechanistic insights into known effects of these modifications on cellular protein abundance. The authors focus on comparative studies of two modifications, ubiquitin and FAT10 (a protein with two ubiquitin-like domains), on a panel of substrate proteins; prior work had established that FAT10-conjugated proteins had lower stability to proteosomal degradation than Ub-modified counterparts.

      Strengths of the work include its integration of data across diverse approaches, including molecular dynamics simulations, solution NMR spectroscopy, and in vitro and cellular stability assays. From these, the authors provide provocative mechanistic insight into the lower stability of FAT10 on its own, and in FAT10-mediated destabilization of substrate proteins in computational and experimental findings. Notably, such destabilization impacts both the tag and tagged proteins, raising some provocative questions about mechanism. The data here are generally compelling, albeit with minor concerns on presentation in parts. Conclusions from this work will be interesting to scientists in several fields, particularly those interested in cellular proteostasis and in vitro protein design / long-range communication.

      The most substantial weakness of this work from my perspective is the specificity of these destabilization effects. In particular, technical challenges of producing bona fide Ub- or FAT10-conjugated substrates with native linkages limits the ability to conduct in vitro studies on exactly the same molecules as being studied in cellular environments. Given some discussion in the manuscript about the importance of linkage location on the specificity of certain tag/substrate interactions, this raises an understandable but unfortunate caveat that needs to be considered more fully both in general and in light of data from other fields (e.g. single molecule pulling) showing site-dependence of comparable effects. I note that these concerns do not impact the caliber of the conclusions themselves, but perhaps suggest area for caution as to their potential impact at this time.

      We thank the reviewer for positive assessment. The reviewer has pointed out the caveats regarding producing Ub- and Fat10-conjugated substrate, which we have now mentioned in the discussion in page 35 line 15.

      Reviewer #2 (Public Review):

      "Plasticity of the proteasome-targeting signal Fat10 enhances substrate degradation" is a nice study where the authors have shown the differences between two protein degradation tags namely, FAT10 and ubiquitin. Even though these tags are closely related in terms of folds, they have differential efficiency in degrading the substrates covalently attached to them. The authors have utilised extensive MD simulations combined with biophysics and cell biology to show the structural dynamics these tags provide for proteasomal degradation.

      We thank the reviewer for positive assessment and suggestions to improve the manuscript quality.

    1. Author response:

      Reviewer #2 (Public Review):

      I have two significant concerns that I believe can be resolved on the timescale of review.

      1) The work identifies substantial thinning in one leaflet. Lipids expand as they thin. Given this, are there too few lipids in this leaflet (which would also indicate thinning)? I would expect their deformations depend strongly on the number-balance of lipids in each leaflet. The authors should check if thinning, and the boundary, is sensitive to inter-leaflet-lipid imbalance.

      We thank Reviewer #2 for this insight, as it led us to evaluate the leaflet tensions in our restrained 2L0J simulation. We found there was an imbalance in the leaflet packing, which we addressed with an extensive set of new simulations and new analysis aimed at generating balanced leaflets.

      See Page 6-8, Appendix Section 1, Appendix – figures 1, 2. We discuss these findings in the new Results section “Protein footprint asymmetry can lead to differential leaflet stresses” and accompanying appendix. Many of the bilayer features in the repacked simulations are consistent with our original submission, but not all. For instance, while we continue to see large tilt immediately around the amphipathic helices in the lower leaflet and little in the upper leaflet, tilts in both leaflets decay to similar values at the box edge (Appendix - figure 2). The degree of membrane pinch along the membrane-protein contact boundaries are less sensitive to the leaflet packing, as demonstrated by the surface heights (Appendix - figure 1).

      Determining the proper change in leaflet count is quite difficult. We are actively extending our continuum model to address questions of differential leaflet strain and coupled lipid tilt, which may allow us to estimate changes in leaflet-count, but this is a significant undertaking beyond the scope of this resubmission.

      2) By constraining the pore to have 2-fold symmetry, the authors remove a large entropic penalty disfavoring such a conformation, and thus presumably disfavoring the negative- gaussian-curvature it induces. For example, if the free energy surface for the fluctuations were rather flat, and only 1% of the conformations were consistent with 2-fold symmetry, the coupling to NGC may be reduced by -kT log( 1 % ), neglecting enhancement by coupling to NGC. Therefore, I predict that the coupling to NGC would be reduced further were the constraint removed.

      We agree with the reviewer that if the 2-fold states are highly disfavored for entropic or enthalpic reasons, it would directly reduce the coupling to NGC. However, we don’t know the free energy difference between these states, and it is hard to calculate them from all-atom and beyond our current scope. While our unrestrained simulations are not converged, they demonstrate that there is a wide range of orientations for the amphipathic helices that are energetically accessible (see Figure 2, Appendix Section 1, and Appendix - figure 4). Still, the DEER data from the Howard lab (Kim et al., 2015) would be better described by further symmetry-broken states with greater inter-AH distances, suggesting that such conformations are not well represented in our equilibrium ensemble.

      Reviewer #3 (Public Review):

      Helsell et al. uses atomistic molecular dynamics simulations to characterize the structural dynamics of the M2 protein together with continuum elastic models to evaluate the energetic cost of the protein-induced bilayer deformations. Using unbiased simulations (without constraints on the protein) they show that the M2 structure is dynamic and that the AH helices are mobile (though they tend to retain their secondary structure), in agreement with experimental observations. Then, using simulations in which the peptide backbone was restrained to the starting structure, they were able to quantitatively characterize the protein- induced bilayer deformations as well as the acyl chain dynamics.

      Both the atomistic simulations and the continuum-based determinations of the bilayer deformation energies are of high quality. The authors are careful to note that their unbiased simulations do not reach equilibrium, and the authors' conclusions are well supported by their results, though some issues need to be clarified.

      1) P. 7: Choice of lipid composition: POPC:POPG:Cholesterol 0.56:0.14:0.3. This lipid composition (or POPC:POPG 0.8:0.2) has been used in a number of experimental studies that the authors use as reference. It differs, however, substantially from the lipid composition of the influenza membrane (Gerl et al., J Cell Biol, 2012; Ivanova et al., ACS Infect Dis, 2015), which is enriched in cholesterol, has a 2:1 ratio of phosphatidylethanolamine to phosphatidylcholine, and almost no PG. The choice of lipid composition is unlikely to impact the authors' major conclusions, but it should be discussed briefly. As noted by Ivanova et al., the lipids of the influenza membrane are enriched in fusogenic lipids. How will that impact the authors results.

      As noted by the Reviewer, the lipid composition we explored was based on DEER studies from Kathleen Howard. While there is a lot of cholesterol in our simulations, it is lower than the lipidomics papers suggest for the viral membrane (Gerl et al., 2012; Ivanova et al., 2015). We hypothesize that further increasing cholesterol would stiffen the membrane even more and cause the energy differences we report here to become even larger – accentuating our finding. We employ 14% POPG and the Simons lab finds about 14% PS. Chemically these headgroups are similar, but the size and spontaneous curvature difference could be a concern. This is the the different intrinsic curvatures of PE versus PC. However, we have not considered spontaneous curvature in our continuum calculations, so we cannot predict how this will influence our results.

      See Appendix - figure 6. We added a new panel to this figure with continuum parameters intended to mimic a high 50 % cholesterol membrane reported for viral coats, and we show that the curvature sensing of symmetry-broken states increases as the cholesterol content increases.

      See Page 25. We added text in the Discussion concerning the difference in lipids found in the virus versus those compositions employed in experiment and here.

      2) The definition of the lipid tilt needs to be revisited. On P. 13 (in the Pdf received for review, the authors do not provide page numbers), the tilt is defined/approximated as "the angle between the presumed membrane normal (aligned with the Z axis of the box) and the vector pointing from each phospholipid's phosphate to the midpoint between the last carbon atoms of the lipid tails." This (equating the normal to the interface with the Z axis of the simulation box) may be an acceptable approximation for the lower leaflet, which is approximately flat, but probably not for the upper leaflet where the interface is curved in the vicinity of the protein. The authors should, at least, discuss the implications of their approximation in terms of their conclusion that there is little lipid tilt in the upper leaflet.

      We agree that our lipid tilt calculations are approximate since we assume the membrane normal points along the z direction. We have now restated this assumption in the Results when we start to discuss tilt. Different models define lipid tilt in different ways, but the work of Deserno defines it with respect to the bilayer mid-plane which is a shared surface for the upper and lower leaflets. Thus, tilt would be moderately impacted in both leaflets. Examining the snapshots at the top of Figure 7, we surmise that the calculated tilts in both leaflets adjacent to the protein would be slightly reduced, leaving the values at the boundary unaffected. Thus, the upper leaflet likely experiences even less tilt than calculated.

      See Page 16. We have added the discussion above to the section on lipid tilt. Also, we have added page numbers to the resubmission.

      3) P. 14, last paragraph, Figure 5 and 6: The snapshots in Figure 5 are too small to see what the authors refer to when they write "tilt their lipid tails to wrap around the helices." The authors should consider citing the work of H W. Huang, e.g., Huang et al. (PRL, 2004), who introduced the notion of curvature stress induced by antimicrobial peptides, a concept similar to what the present authors propose.

      See Page 17. We have now drawn the connection between what our simulations are showing and the earlier work by Huey Huang on antimicrobial peptides.

      See Figure 7. To make the lipid deformations easier to see, we are attaching the full-size versions of each snapshot to the figure as supplemental data.

      4) P. 17-18, Figure 7: The authors introduce the bilayer midplane, which becomes important for the determination of the deformation energy in the (unnumbered) equation on P. 17, but do not specify how it is determined. This is a non-trivial undertaking, but critical for the evaluation of the deformation energy; please add the necessary details.

      See Pages 15 and 20. In the continuum model, we define CM (the compression surface) following the work of May and colleagues (and other groups) as the areal compression weighted mean of the upper and lower surface. In the MD simulation results in Figure 6, we define leaflet thickness as the absolute difference between the interpolated leaflet hydrophobic surface (calculated using the first carbon atoms of each POPC and POPG lipid tail) and the interpolated bilayer midplane surface (calculated as the average of the upper and lower leaflet tail surfaces, each interpolated based on the last carbon atoms of each POPC and POPG lipid tail for each leaflet, respectively). These two leaflet-based definitions are different, and a more sophisticated continuum model of the upper and lower leaflet coupling would require the incorporation of lipid tilt, which we do not currently have.

      5) P. 18-19, Figure 8: The comparison of the MD and continuum membrane deformations is very informative, but the authors should discuss the implications of the increased symmetry further in terms of the estimated deformation energies. (I do not believe the authors really mean that they predicted the energies, they estimated/approximated them.)

      The Reviewer is correct, we are not predicting the energies of the actual MD generated bilayers, but rather we are estimating the energies of these shapes using a continuum-based approximation. The good agreement between the MD generated surfaces and the continuum predicted surfaces suggested that the model is capturing the underlying physics. We argued that the increased symmetry of the continuum surfaces compared to the MD surfaces was due to incomplete sampling in the MD. We were right about that. Please see revised Figure 10 with new data and some longer simulations, where the symmetry in the MD is now apparent and the match between continuum and MD is even better. Frankly, we are very pleased with these new results.

      See Page 18 and Figure 10. We have changed language throughout moving away from “predicting” to “estimating”. The new MD generated data shows much greater symmetry reflected in the starting structures, and better agreement with model predictions.

      References

      Argudo, D., Bethel, N. P., Marcoline, F. V., Wolgemuth, C. W., & Grabe, M. (2017). New Continuum Approaches for Determining Protein-Induced Membrane Deformations. Biophys J, 112(10), 2159-2172. https://doi.org/10.1016/j.bpj.2017.03.040

      Bethel, N. P., & Grabe, M. (2016). Atomistic insight into lipid translocation by a TMEM16 scramblase. Proc Natl Acad Sci U S A, 113(49), 14049-14054. https://doi.org/10.1073/pnas.1607574113

      Drabik, D., Chodaczek, G., Kraszewski, S., & Langner, M. (2020). Mechanical Properties Determination of DMPC, DPPC, DSPC, and HSPC Solid-Ordered Bilayers. Langmuir, 36(14), 3826-3835. https://doi.org/10.1021/acs.langmuir.0c00475

      Ferreira, T. M., Coreta-Gomes, F., Ollila, O. H., Moreno, M. J., Vaz, W. L., & Topgaard, D. (2013). Cholesterol and POPC segmental order parameters in lipid membranes: solid state 1H-13C NMR and MD simulation studies. Phys Chem Chem Phys, 15(6), 1976- 1989. https://doi.org/10.1039/c2cp42738a

      Gerl, M. J., Sampaio, J. L., Urban, S., Kalvodova, L., Verbavatz, J. M., Binnington, B., Lindemann, D., Lingwood, C. A., Shevchenko, A., Schroeder, C., & Simons, K. (2012). Quantitative analysis of the lipidomes of the influenza virus envelope and MDCK cell apical membrane. J Cell Biol, 196(2), 213-221. https://doi.org/10.1083/jcb.201108175

      Henriksen, J., Rowat, A. C., Brief, E., Hsueh, Y. W., Thewalt, J. L., Zuckermann, M. J., & Ipsen, J. H. (2006). Universal behavior of membranes with sterols. Biophys J, 90(5), 1639- 1649. https://doi.org/10.1529/biophysj.105.067652

      Hossein, A., & Sodt, A. J. (2023). Membraneanalysis. jl: A Julia package for analyzing molecular dynamics simulations of lipid membranes. Journal of Open Source Software, 8(87), 5380.

      Hu, M., Briguglio, J. J., & Deserno, M. (2012). Determining the Gaussian curvature modulus of lipid membranes in simulations. Biophys J, 102(6), 1403-1410. https://doi.org/10.1016/j.bpj.2012.02.013

      Ivanova, P. T., Myers, D. S., Milne, S. B., McClaren, J. L., Thomas, P. G., & Brown, H. A. (2015). Lipid composition of viral envelope of three strains of influenza virus - not all viruses are created equal. ACS Infect Dis, 1(9), 399-452. https://doi.org/10.1021/acsinfecdis.5b00040

      Kim, S. S., Upshur, M. A., Saotome, K., Sahu, I. D., McCarrick, R. M., Feix, J. B., Lorigan, G. A., & Howard, K. P. (2015). Cholesterol-Dependent Conformational Exchange of the C- Terminal Domain of the Influenza A M2 Protein. Biochemistry, 54(49), 7157-7167. https://doi.org/10.1021/acs.biochem.5b01065

      Kučerka, N., Tristram-Nagle, S., & Nagle, J. F. (2006). Structure of fully hydrated fluid phase lipid bilayers with monounsaturated chains. J Membr Biol, 208(3), 193-202.

      Latorraca, N. R., Callenberg, K. M., Boyle, J. P., & Grabe, M. (2014). Continuum approaches to understanding ion and peptide interactions with the membrane. J Membr Biol, 247(5), 395-408. https://doi.org/10.1007/s00232-014-9646-z

      Liu, J., Kaksonen, M., Drubin, D. G., & Oster, G. (2006). Endocytic vesicle scission by lipid phase boundary forces. Proc Natl Acad Sci U S A, 103(27), 10277-10282. https://doi.org/10.1073/pnas.0601045103

      Pan, J., Tristram-Nagle, S., & Nagle, J. F. (2009). Effect of cholesterol on structural and mechanical properties of membranes depends on lipid chain saturation. Phys Rev E Stat Nonlin Soft Matter Phys, 80(2 Pt 1), 021931. https://doi.org/10.1103/PhysRevE.80.021931

      Rawicz, W., Olbrich, K. C., McIntosh, T., Needham, D., & Evans, E. (2000). Effect of chain length and unsaturation on elasticity of lipid bilayers. Biophys J, 79(1), 328-339. https://doi.org/10.1016/S0006-3495(00)76295-3

      Sun, D., Peyear, T. A., Bennett, W. F. D., Andersen, O. S., Lightstone, F. C., & Ingolfsson, H. I. (2019). Molecular Mechanism for Gramicidin Dimerization and Dissociation in Bilayers of Different Thickness. Biophys J, 117(10), 1831-1844. https://doi.org/10.1016/j.bpj.2019.09.044

      Tzlil, S., Deserno, M., Gelbart, W. M., & Ben-Shaul, A. (2004). A statistical-thermodynamic model of viral budding. Biophys J, 86(4), 2037-2048. https://doi.org/10.1016/S0006- 3495(04)74265-4

      Ursell, T. S., Klug, W. S., & Phillips, R. (2009). Morphology and interaction between lipid domains. Proc Natl Acad Sci U S A, 106(32), 13301-13306. https://doi.org/10.1073/pnas.0903825106

      Veatch, S. L., & Keller, S. L. (2003). Separation of liquid phases in giant vesicles of ternary mixtures of phospholipids and cholesterol. Biophys J, 85(5), 3074-3083. https://doi.org/10.1016/S0006-3495(03)74726-2

      Venable, R. M., Brown, F. L. H., & Pastor, R. W. (2015). Mechanical properties of lipid bilayers from molecular dynamics simulation. Chem Phys Lipids, 192, 60-74. https://doi.org/10.1016/j.chemphyslip.2015.07.014

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors): 

      The author has addressed all the concerns I have raised.

      I have only one minor suggestion. 

      We would argue both a gray screen and a grating are visual stimuli. ... We concur, our data only address one of many possible transitions, but it is a switch between distinct visual stimuli that is sped up by ACh. 

      Thank you for clarifying this. 

      Following my comment in the previous review, the author has revised the abstract as follows:  (Before) "Our results suggest that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, enabling faster switching between internal representations during locomotion." 

      (After) "Based on this we speculate that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, possibly enabling faster switching between internal representations during locomotion." 

      My previous comment concerned specifically the latter part, "enabling faster switching between internal representations during locomotion", and, in fact, their data fully support the first part, "acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network". Thus, I suggest the following sentence: 

      "Our results suggest that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, possibly enabling faster switching between internal representations during locomotion." 

      Thank you for clarifying. We have changed as suggested.

      Reviewer #2 (Recommendations For The Authors): 

      I thank the authors for the clarification regarding the distribution of running speeds in the study. I do agree that 30 cm/s is indeed fast for head-fixed locomotion. My concern is that while all mice contribute to the low locomotion velocity bin, the high locomotion velocity bin is dominated by a subset of animals, since not all mice reached high locomotion speeds. Therefore, the comparison between low, intermediate and high locomotion velocities includes data from different cohorts of animals and variability across animals may confound the analysis of cholinergic axon activity. However, the manuscript is carefully worded to emphasize lack of evidence (e.g. "we found no evidence of an increase in calcium activity between low and high locomotion velocities") and I have revised my summary in the public review to reflect this. 

      I thank the authors for including the scatterplots of single neuron responses locomotion and optogenetic stimulation, which illustrate their heterogeneity. I am surprised that the axes are limited to 20% deltaF/F as visual responses recorded using GCaMP6f often exceed 100% deltaF/F . 

      There are definitely neurons with responses larger than 20% dF/F0, but it is a small fraction. There are two considerations relevant to assessing dF/F amplitudes. First, in our hands trial averaged dF/F0 responses tend to be below 30% even for the most responsive neurons (trial averaging convolves response amplitude and response reliability). The reviewer is probably thinking of single trial responses often shown as raw data that can exceed 100s of %. Second, different published variants for calculating dF/F0 can result in a spectrum of values that varies by up to a factor of 10. This is largely a consequence of the choice of F0 and preprocessing related to correcting slow drifts in signal strength (originally motivated by photobleaching). Attempting to compare dF/F0 across labs is unfortunately a futile effort in absence of standardized way of calculating it. 

      Allow me to clarify how evaluating the effects of optogenetic stimulation and locomotion without analyzing them at the level of individual neurons could result in misleading conclusions. I will use the effects of cholinergic responses on grating responses as an example but this concern applies equally to the other analyses. The manuscript reports that "in layer 2/3, optogenetic activation of cholinergic axons did not result in a detectable increase in grating onset responses (Figure 4C), while the responses of layer 5 neurons to the same stimulus increased with concurrent optogenetic activation of cholinergic axons." As the Figure R2C-D illustrates, only a minority of L2/3 neurons are excited by the grating in baseline conditions, while the vast majority are either suppressed or non-responsive. This is expected, as it is well established that visual responses in layer 2/3 are sparse. If responses of the small subset of L2/3 neurons that are activated by the grating were enhanced, it may not be apparent in the population average presented in the manuscript. In contrast, since a larger fraction of L5 neurons is excited by the grating, enhancement of grating responses may be easier to detect. In other words, the effects of optogenetic stimulation may be to boost the responses of those neurons that are activated by the grating and the difference between L2/3 and L5 lies simply in the proportion of activated neurons. I do not mean to argue in favour of this specific scenario but simply present it so as to illustrate the way in which considering population averages alone may be misleading. 

      While the authors state in their response that "all relevant and clear conclusions are already captured by the mean differences shown in Figure 4", the evidence supporting this statement is not presented in the manuscript. Most importantly, it is essential to determine whether the neurons that show significant activation in response to gratings (Figure 4C-D), mismatch (Figure 4E-F) or locomotion (Figure 4G-H), are affected by optogenetic stimulation in the same way as the population average. 

      We have added the analysis suggested as Figure S6. Consistent with the population averages, even within the subset of layer 2/3 neurons most responsive to specific inputs, we found no detectable increase in responsiveness upon optogenetic stimulation of cholinergic axons.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Overall, the manuscript is very well written, the approaches used are clever, and the data were thoroughly analyzed. The study conveyed important information for understanding the circuit mechanism that shapes grid cell activity. It is important not only for the field of MEC and grid cells, but also for broader fields of continuous attractor networks and neural circuits.

      We appreciate the positive comments.

      (1) The study largely relies on the fact that ramp-like wide-field optogenetic stimulation and focal optogenetic activation both drove asynchronous action potentials in SCs, and therefore, if a pair of PV+ INs exhibited correlated activity, they should receive common inputs. However, it is unclear what criteria/thresholds were used to determine the level of activity asynchronization, and under these criteria, what percentage of cells actually showed synchronized or less asynchronized activity. A notable percentage of synchronized or less asynchronized SCs could complicate the results, i.e., PV+ INs with correlated activity could receive inputs from different SCs (different inputs), which had synchronized activity. More detailed information/statistics about the asynchronization of SC activity is necessary for interpreting the results.

      The short answer here is that spiking responses from the pairs of SCs that we sampled appear asynchronous. We now show this in the form of cross-correlograms for all recorded pairs of SCs (Figure 2, Figure Supplement 1). The correlograms lack peaks that would indicate synchronous activation. Thus, while our dataset is not large enough to rule out occasional direct synchronisation of SCs, this appears unlikely to account for synchronised input to PV+INs.

      This conclusion is consistent with consideration of mechanisms that could in principle synchronise SCs:

      First, if responses to ramping light inputs was fully deterministic, then this could lead to fixed relative timing of spikes fired by different SCs. This is unlikely given the influence of stochastic channel gating on SC spiking (Dudman and Nolan 2009) and is inconsistent with trial to trial variability in spike timing (Figure 2, Figure Supplement 2).

      Second, as SCs are glutamatergic they could excite one another. However, excitatory connections between stellate cells are rare (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016) and when detected they have low amplitude (mean < 0.25 mV; (Winterer et al. 2017)). Our finding that spiking by pairs of SCs is not correlated is consistent with this.

      Third, strong interaction between stellate cells mediated by local inhibitory pathways (Pastoll et al. 2013; Couey et al. 2013) could coordinate their activity. The lack of correlation between spiking of pairs of SCs suggests that such coordination is rarely recruited by our ramping protocols. Nevertheless, recruitment of inhibition may happen to some extent as experiments in Figure 4 show that correlated input from SCs to more distant, but not nearby PV+INs, is reduced by blocking inhibitory synapses. Given that we don't find evidence for synchronised spiking of SCs, this additional common input to widely separated PV+INs is instead best explained by recruitment of interneurons that act directly on the target SCs. We have modified Figure 8 to make this clear.

      Thus, for experiments with ramping light stimuli, synchronous activation of SCs is unlikely to explain common input to PV+INs. Input from the same SC best explains correlated responses of nearby PV+IN inhibitory populations, while recruitment of an additional inhibitory pathway may contribute to correlated responses of more distant PV+INs.

      For experiment using focal stimulation, substantial trial-to-trial variation in SC spike timing argues strongly against deterministic coordination. Indirect coordination of presynaptic neurons is also extremely unlikely given that focal activation is sparse and brief, while inputs from many presynaptic SCs are required to drive a postsynaptic interneuron to spike (e.g. (Pastoll et al. 2013; Couey et al. 2013)). Results from these experiments thus corroborate results from experiments using ramping light stimulation.

      In revising the manuscript we have tried to ensure these arguments are clear (e.g. p 5, para 3; p 6, para 2; p 10, para 1).

      (2) The hypothesis about the "direct excitatory-inhibitory" synaptic interactions is made based on the GABAzine experiments in Figure 4. In the Figure 8 diagram, the direct interaction is illustrated between PV+ INs and SCs. However, the evidence supporting this "direct interaction" between these two cell types is missing. Is it possible that pyramidal cells are also involved in this interaction? Some pieces of evidence or discussions are necessary to further support the "direction interaction".

      Indirect connections between stellate cells mediated via fast spiking inhibitory interneurons are well established by previous studies (e.g. (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016), and so were not addressed here. Previous work also establishes that connections from stellate cells to pyramidal cells are extremely rare (Winterer et al. 2017). Because the Sim1:Cre mouse line is specific to stellate cells and does not drive transgene expression in pyramidal cells (Sürmeli et al. 2015), it's therefore unlikely that pyramidal cells play a role.

      To make these points clearer we have modified the text in the discussion (p 5, para 3; p 10, paras 1 & 2). We have also modified Figure 8 to highlight that the indirect interaction may be best accounted for by inhibitory pathways onto PV+INs rather than via SCs (which our new cross-correlation analyses indicate is unlikely).

      Reviewer #2 (Public Review):

      In this study, Huang et al. employed optogenetic stimulation alongside paired whole-cell recordings in genetically defined neuron populations of the medial entorhinal cortex to examine the spatial distribution of synaptic inputs and the functional-anatomical structure of the MEC. They specifically studied the spatial distribution of synaptic inputs from parvalbumin-expressing interneurons to pairs of excitatory stellate cells. Additionally, they explored the spatial distribution of synaptic inputs to pairs of PV INs. Their results indicate that both pairs of SCs and PV INs generally receive common input when their relative somata are within 200-300 ums of each other. The research is intriguing, with controlled and systematic methodologies. There are interesting takeaways based on the implications of this work to grid cell network organization in MEC.

      We appreciate the positive comments.

      (1) Results indicate that in brain slices, nearby cells typically share a higher degree of common input. However, some proximate cells lack this shared input. The authors interpret these findings as: "Many cells in close proximity don't seem to share common input, as illustrated in Figures 3, 5, and 7. This implies that these cells might belong to separate networks or exist in distinct regions of the connectivity space within the same network.". Every slice orientation could have potentially shared inputs from an orthogonal direction that are unavoidably eliminated. For instance, in a horizontal section, shared inputs to two SCs might be situated either dorsally or ventrally from the horizontal cut, and thus removed during slicing. Given the synaptic connection distributions observed within each intact orientation, and considering these distributions appear symmetrically in both horizontal and sagittal sections, the authors should be equipped to estimate the potential number of inputs absent due to sectioning in the orthogonal direction. How might this estimate influence the findings, especially those indicating that many close neurons don't have shared inputs?

      Given we find high probabilities of correlated inputs to nearby cells in both planes, our conclusion that nearby cells are likely to receive common inputs appears to be independent of the slice plane. For cells further apart, where the degree of correlated input becomes more variable, it is possible that cell pairs that have low input correlations measured in one slice plane would have high input correlations if measured in a different plane. An argument against this is that as the cell pairs are further apart, it is less likely that an orthogonal axon would intersect dendritic trees of both cells. Nevertheless, we can't rule this out given the data here. We have amended the discussion to highlight this possibility (p 10, para 1). We agree it would be interesting to address this point further with quantitative analyses but this will be difficult without detailed reconstructions of the circuit.

      (2) The study examines correlations during various light-intensity phases of the ramp stimuli. One wonders if the spatial distribution of shared (or correlated) versus independent inputs differs when juxtaposing the initial light stimulation phase, which begins to trigger spiking, against subsequent phases. This differentiation might be particularly pertinent to the PV to SC measurements. Here, the initial phase of stimulation, as depicted in Figure 7, reveals a relatively sparse temporal frequency of IPSCs. This might not represent the physiological conditions under which high-firing INs function. While the authors seem to have addressed parts of this concern in their focal stim experiments by examining correlations during both high and low light intensities, they could potentially extract this metric from data acquired in their ramp conditions. This would be especially valuable for PV to SC measurements, given the absence of corresponding focal stimulation experiments.

      We understand the gist of the question here as being can differences in correlation scores between initial vs later phases of responses to ramping light inputs be used to infer spatial organisation? These differences are likely to reflect heterogeneity in the spiking of the input neurons, for example through differences in spike threshold, spike frequency adaptation and saturation of spiking (e.g. Figure 2, Figure Supplement 1A, and also see (Pastoll et al. 2020)). We don't expect these differences to have any spatial organisation along the mediolateral axis, and while spike threshold follows a dorsoventral organisation there is nevertheless substantial local variation between neurons (Pastoll et al. 2020). It's therefore unlikely we can use differences in early versus late correlations to make the inferences proposed by the reviewer.

      With respect to PV to SC measurements, similar heterogeneity is likely. We note that we were unable to carry out focal stimulation experiments for PV to SC connections as PV neurons did not spike in response to focal optogenetic stimulation.

      With respect to physiological conditions, our aim here is simply to assess connectivity in well controlled conditions, e.g. voltage-clamp, minimal spontaneous activity, known neuronal locations, etc. It's not clear that physiological activation patterns would improve on these tests and quite likely data would be noisier and harder to interpret.

      (3) Re results from Figure 2: Please fully describe the model in the methods section. Generally, I like using a modeling approach to explore the impact of convergent synaptic input to PVs from SCs that could effectively validate the experimental approach and enhance the interpretability of the experimental stim/recording outcomes. However, as currently detailed in the manuscript, the model description is inadequate for assessing the robustness of the simulation outcomes. If the IN model is simply integrate-and-fire with minimal biophysical attributes, then the findings in Fig 2F results shown in Fig 2F might be trivial. Conversely, if the model offers a more biophysically accurate representation (e.g., with conductance-based synaptic inputs, synapses appropriately dispersed across the model IN dendritic tree, and standard PV IN voltage-gated membrane conductances), then the model's results could serve as a meaningful method to both validate and interpret the experiments.

      We appreciate the simulation descriptions were insufficient and have modified the manuscript to include additional details and clarification (p 14, paras 1-3).

      We're not sure we follow the logic here with respect to model types. The experiments were carried out in the voltage-clamp recording configuration with the goal of identifying correlated inputs independently from how they are integrated by the postsynaptic neuron. Given that membrane potential doesn't change (and so the CdVm/dt term of the membrane equation = 0), integrate and fire and point conductance-based models both simplify down to summing of input currents. We achieve this by convolving spike times with experimentally measured synaptic current waveforms. An assumption of our approach is that we achieve a reasonable space clamp. We believe this is justified given that stellate cells and PV interneurons are reasonably electrotonically compact, and that our analysis relies on consistent correlations rather than absolute amplitudes or time constants of the postsynaptic response and so should tolerate moderate space clamp errors.

      Reviewer #3 (Public Review):

      This paper presents convincing data from technically demanding dual whole-cell patch recordings of stellate cells in medial entorhinal cortex slice preparations during optogenetic stimulation of PV+ interneurons. The authors show that the patterns of postsynaptic activation are consistent with dual recorded cells close to each other receiving shared inhibitory input and sending excitatory connections back to the same PV neurons, supporting a circuitry in which clusters of stellate cells and PV+IN interact with each other with much weaker interactions between clusters. These data are important to our understanding of the dynamics of functional cell responses in the entorhinal cortex. The experiments and analysis are quite complex and would benefit from some revisions to enhance clarity.

      These are technically demanding experiments, but the authors show quite convincing differences in the correlated response of cell pairs that are close to each other in contrast to an absence of correlation in other cell pairs at a range of relative distances. This supports their main point of demonstrating anatomical clusters of cells receiving shared inhibitory input.

      We appreciate the positive comments.

      The overall technique is complex and the presentation could be more clear about the techniques and analysis. In addition, due to this being a slice preparation they cannot directly relate the inhibitory interactions to the functional properties of grid cells which was possible in the 2-photon in vivo imaging experiment by Heys and Dombeck, 2014.

      We have modified the manuscript to try to improve the presentation (specific changes are detailed below). We agree that an important future challenge is to relate our findings to in vivo observations (p 11, para 2).

      Reviewer #1 (Recommendations For The Authors):

      Major points

      (1) The study largely relies on the fact that ramp-like wide-field optogenetic stimulation and focal optogenetic activation both drove asynchronous action potentials in SCs, and therefore, if a pair of PV+ INs exhibited correlated activity, they should receive common inputs. In Figure 2 and its supplementary figures, the authors also showed examples of asynchronized activity. However, it is unclear to me what criteria/thresholds were used to determine the level of activity asynchronization, and under these criteria, what percentage of cells actually showed synchronized or less asynchronized activity. A notable percentage of synchronized or less asynchronized SCs could complicate the results, i.e., PV+ INs with correlated activity could receive inputs from different SCs (different inputs), which had synchronized activity. Related to this concern, it would also be important to simulate what level of activity asynchronization in SCs could still lead to correlated PV+ IN activity above shuffle, and among the recorded SCs, what percentage of cells belong to this synchronized/less asynchronized category.

      We address this point in our response to the public review. In brief, we have added additional cross-correllograms showing that ramp activation of SC pairs does not cause detectable synchronous activation. We also clarify that sensitivity of correlations of some widely separated pairs to GABA-blockers is suggestive of SCs activating common inhibitory inputs to cell pairs.

      (2) The above concern is more relevant to the focal stimulation experiments, in which the authors tried to claim that a pair of PV+ INs with correlated activity could receive inputs from the same SCs neurons. The authors also showed that the stimulation patterns leading to the activation of PV+ INs were more similar if PV+ INs had correlated activity (Figure 5D). However, if nearby SCs were more synchronized than distal SCs within this stimulation scale, even though a pair of PV+ INs showed correlated activity, they could still receive inputs from different but nearby SCs. In this case, it would be helpful to quantify the relationship between the level of activity synchronization of SCs and their distances. In Figure 5 Supplementary Figure 1, the data were only provided for 8 cells. If feasible, collecting data from more cells would be needed for the proposed analysis.

      We explain in our responses to point 1 above and in the public review that direct synchronisation of SCs is unlikely. This is particularly unlikely for focal stimulation experiments as the timing of responses of individual SCs is extremely variable between trials. Thus, even if there were strong synaptic connections between SCs, which the evidence suggests there is not (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016), then this would be unlikely to result in reliably timed coordinated firing.

      (3) It is unclear what the definition of "common inputs" is. Do they refer to inputs from the same group of cells? If different groups of cells provide synchronized inputs, will the inputs be considered "common inputs" or "different inputs"?

      We used "common" in an attempt to be consistent with classic work by Yoshimura et al. and in an attempt to be succinct. Thus, by common input we are referring to cell pairs for which a proportion of their input is from the same presynaptic neuron(s), as opposed to cell pairs for which their input is from different neurons and therefore have no common input. We have attempted to make sure this is clear in the revised manuscript (e.g description of simulations on p 4, para 2).

      (4) In the introduction and abstract, it was mentioned that "dense, but specific, direct excitatory-inhibitory synaptic interactions may operate at the scale of grid cell clusters". It is unclear to me how "dense" was demonstrated in the data. Can the authors clarify?

      Thanks for flagging this, we were insufficiently clear. We have revised the text to refer to cell pairs for which a proportion of their input is from the same presynaptic neurons (e.g. p 3, para 1), and separately about indirect coordination, by which we mean inputs to cell pairs that appear correlated because of coordination between upstream neurons.

      (5) The hypothesis about the "direct excitatory-inhibitory" synaptic interactions is made based on the GABAzine experiments in Figure 4. In the Figure 8 diagram, the direct interaction is illustrated between PV+ INs and SCs. Is there any evidence supporting this "direct interaction"?

      The direct interaction from SCs to PV+INs and from PV+INs to SCs were previously demonstrated by experiments with recordings from pairs of neurons (e.g. (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016; Winterer et al. 2017). Our results in Figures 3-5, which show that exciting SCs by light activation of ChR2 leads to excitation of PV+INs, and in Figure 7, which show that light activation of PV+INs expressing ChR2 leads to inhibition of SCs, are consistent with these previous conclusions. We have modified the manuscript to make sure this is clear (p 2, para 3).

      Is it possible that pyramidal cells are also involved in this interaction? If this is unlikely, the author may provide some pieces of evidence (e.g., timing of responses after optogenetic stimulation) or some discussions.

      This is unlikely given that previous studies indicate that connections from stellate to pyramidal cells are weak or absent (Winterer et al. 2017). We now clarify this in the Discussion (p 10, para 1).

      Minor points (1) Page 4: the last paragraph: the author claimed that CCpeakmean was reduced and CClagvar increased with cell separation. Although the trends are visible in the figures, the author may provide appropriate statistics to support this statement, such as a correlation between cell separation and CCpeakmean CClagvar./

      We have inserted summaries of linear model fits into the legends for Figure 3E-F, Figure 5F-H and Figure 7D.

      (2)  If I understood correctly, in the second last paragraph on page 6, "pairs of SCs" should be changed to "pairs of PV+ INs".

      Thanks. Corrected.

      (3)  Page 9: the 7th line to the end: where is Figure S4?

      Corrected to 'Figure 3, Figure Supplement 2'.

      (4)  Page 27: at the end of figure caption B: two ".

      Corrected.

      (5)  Figures 3A and B: what are the red vertical rectangles?

      These are the regions shown on an expanded time base in C and D. This is now clarified in the legend.

      (6)  Page 28 Figure caption of D and E: (C) and (D) should be (D) and (E).

      Corrected.

      (7)  The first sentence of the third paragraph in INTRODUCTION: 'later' should be 'layer'.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      - Some related work has been done by Beed et al. 2013 to map the spatial distribution of inputs to neurons in MEC. Certainly, there are differences in the approaches and the key questions, but the contribution of this study would benefit from a more detailed comparison of the results from Beed vs the current study and should be included in the discussion.

      It's hard to include a detailed comparison of results, at least without losing focus, as the two studies address different questions with different approaches. We already noted that 'Local optical activation of unidentified neurons has also been used to infer connectivity principles but with a focus on responses of single postsynaptic neurons (Beed et al., 2013, 2010)'. In addition, we now note that 'Our focal optogenetic stimulation approach also offers insight into the spatial organization of presynaptic neuronal populations, with the advantage, compared to focal glutamate uncaging previously used to investigate connectivity in the MEC (Beed et al., 2013, 2010), that the identity of the presynaptic cell population is genetically defined'.

      - There are a few places where the language is ambiguous or needs a more detailed description for clarity. • 3rd paragraph under "Focal activation of SCs generates common input to nearby PV+Ins". The correlation probability description in this paragraph and a similar sentence in the methods are very hard to understand. I had to look up the analysis in Yoshimura et al. 2005 to understand what was done here. It's a nice analysis, but the manuscript could benefit from a more detailed description of this measure in the methods.

      We agree, it is a somewhat complex metric and is challenging to explain. In the interests of keeping the main text succinct, we have left the bare bones explanation as it was in the Results, but have expanded the explanation in the Methods. We hope this is now clear.

      - " Alternatively, if there is no clear spatial organization of SC to PV+INs connections, then the similarity between stimulus locations for pairs of SCs should have a random distribution." This sentence is hard to understand. I think the use of the phrase "similarity of stimulus location" is a strange phrasing and is driving the confusion in this sentence.

      We have replaced this with 'correspondence between active stimulus locations'.

      - In the discussion under "Spatial extent and functional organization of L2 circuits" there is a grammatical mistake (seems to be 2x phrasing of "leads to common synaptic input").

      Corrected.

      - Citation in the introduction/discussion. Introduction: in addition to Gu et al. 2018, Heys et al 2014 also showed there are non-random correlations among putative grid cells as a function of their somatic distance. In the discussion section, in addition to Gu et al. 2018, Heys et al. 2014 showed there is anatomical clustering of grid cells in MEC. This earlier work investigating functional correlations among neurons in the superficial aspect of MEC in vivo should be cited and is particularly relevant in these two sections of the manuscript.

      Thanks, we apologise for the oversight. We're well aware of this important study and have now cited it.

      -Typo - Paragraph 3 of the intro; "later" should be layer.

      Corrected.

      -Figure 5 (D-E) there is a typo high correlation probability is D and low correlation is E (text says C/D).

      Corrected.

      Reviewer #3 (Recommendations For The Authors):

      The paper is missing the bibliography section. This makes the review somewhat difficult as some cited papers are not immediately familiar based on the citation.

      Thanks and our apologises for making extra work by omitting this. It is now included.

      Page 2 - "cell clusters" - they should also cite the paper by Heys and Dombeck, 2014 that shows a spatial scale of inhibitory interactions computed based on correlations of grid cells recorded using 2-photon calcium imaging.

      Added (see above).

      Page 2 - "later 2 of the MEC" - layer.

      Corrected.

      Page 2 - "synaptic interactions" - again they should mention the work by Heys and Dombeck, 2014 that indirectly measured the spatial scale of inhibition.

      Now cited in this paragraph.

      Page 4 "we simulated responses" and Figure 2E - in each simulation - did they fit the magnitude and time constant of the simulated EPSCs to individual EPSCs in the data? Or did they randomly vary these to find the best fit?

      The parameters for the simulations are given in the Methods and were chosen to correspond to the experimental values. We have rewritten this section to make the simulation methods clearer. Simulations using different time constants within a physiological range support similar conclusions.

      Page 4 - "we identified 35/71" - Are these the cells that appear in yellow as correlated in Figures 3E-F? If so, the text should indicate that these cells are shown in yellow.

      We have added this and have also updated the legends for additional clarification.

      Figure 2, Figure Supplement 1 - B,C - the following phrase is not clear: "when the 4 / 8 of each neurons inputs from SCs also project to the other neuron (B)," Should the "the" be removed? Also, by 4/8 do they mean 50%, or do they mean 4 to 8?

      Thanks, we've reworded to improve the clarity.

      E - "receiving presynaptic inputs consisted of 4 overlapping SCs" - should it say "consisting"?

      Corrected.

      Figure 3, Figure Supplement 1 part E - "the same data as (C )" - should this be the same data as (D)?? I do not see how doing clustering on the shuffled data in (C ) would give two groups, but it makes sense if it is from (D).

      That's right, now corrected.

      Page 5 - "used action potentials" - this is confusing. Is the word "used" supposed to be there?

      Corrected.

      Page 5 - "widefield activation experiments" - they should cite the experiments that they are referring to here.

      Added.

      Page 5 - "effect of blocking" - "Figure 4" - I find it very odd that the agent GABAzine in Figure 4 is not explicitly mentioned in the main text (though it is mentioned in the methods). The main text should indicate that blocking was performed using GABAzine.

      Added.

      Page and page 14 and Figure 5 - "shifted" - do they mean shuffled?

      We do. The classic papers by Yoshimura et al. used shifted so we keep this here so it's clear we've used their approach. We've added additional explanation to try to make sure the meaning is clear.

      Figure 5 A, B, D, and E would benefit from a more detailed description. They should state whether the labels "1a" and "1b" and "2a" and "2b" refer to different recorded neurons in each pair. They should indicate that 2a and 2b are a different pair? Are the x, y axes of the images corresponding to anatomical position? Does "B" indicate the location of recordings shown in Figure 5B? The authors probably think this is all obvious, but it is not immediately obvious to the reader.

      We have added additional clarification.

      Page 8 - "Beed et al." - These papers by Beed ought to be cited in the introduction as well as they are highly relevant.

      We now cite Beed et al. 2013 in the Introduction when we discuss local inhibitory input to SCs. While the Beed et al. 2010 paper is an important contribution to understanding about pathways from deep to superficial layers, the introduction focuses on communication between identified pre- and postsynaptic populations within layer 2 and therefore we haven't found a way to cite it without losing focus. We do cite this paper multiple times elsewhere.

      Page 10 - "Excitatory-inhibitory interactions" - this summary of attractor models ought to cite the paper by Burak and Fiete as well.

      The discussion focuses on models with excitatory-inhibitory connectivity and cites an important paper from the Fiete group. The model by Burak and Fiete, while also important, is purely inhibitory and so is not well constrained by the known circuitry, and therefore could not be correctly cited here.

      Page 10 - "be consistent with models…or that focus on pyramidal neurons have also been proposed" - this seems ungrammatical as if two different sentences were merged.

      Corrected.

      References

      Couey, Jonathan J, Aree Witoelar, Sheng-Jia Zhang, Kang Zheng, Jing Ye, Benjamin Dunn, Rafal Czajkowski, et al. 2013. “Recurrent Inhibitory Circuitry as a Mechanism for Grid Formation.” Nat. Neurosci. 16 (3): 318–24. https://doi.org/10.1038/nn.3310.

      Dudman, Joshua T, and Matthew F Nolan. 2009. “Stochastically Gating Ion Channels Enable Patterned Spike Firing through Activity-Dependent Modulation of Spike Probability.” Plos Comput. Biol. 5 (2): e1000290. https://doi.org/10.1371/journal.pcbi.1000290.

      Fuchs, Elke C, Angela Neitz, Roberta Pinna, Sarah Melzer, Antonio Caputi, and Hannah Monyer. 2016. “Local and Distant Input Controlling Excitation in Layer II of the Medial Entorhinal Cortex.” Neuron 89 (1): 194–208. https://doi.org/10.1016/j.neuron.2015.11.029.

      Pastoll, Hugh, Derek L Garden, Ioannis Papastathopoulos, Gülşen Sürmeli, and Matthew F Nolan. 2020. “Inter- and Intra-Animal Variation in the Integrative Properties of Stellate Cells in the Medial Entorhinal Cortex.” Elife 9 (February). https://doi.org/10.7554/eLife.52258.

      Pastoll, Hugh, Lukas Solanka, Mark C W van Rossum, and Matthew F Nolan. 2013. “Feedback Inhibition Enables Theta-Nested Gamma Oscillations and Grid Firing Fields.” Neuron 77 (1): 141–54. https://doi.org/10.1016/j.neuron.2012.11.032.

      Sürmeli, Gülşen, Daniel Cosmin Marcu, Christina McClure, Derek L F Garden, Hugh Pastoll, and Matthew F Nolan. 2015. “Molecularly Defined Circuitry Reveals Input-Output Segregation in Deep Layers of the Medial Entorhinal Cortex.” Neuron 88 (5): 1040–53. https://doi.org/10.1016/j.neuron.2015.10.041.

      Winterer, Jochen, Nikolaus Maier, Christian Wozny, Prateep Beed, Jörg Breustedt, Roberta Evangelista, Yangfan Peng, Tiziano D’Albis, Richard Kempter, and Dietmar Schmitz. 2017. “Excitatory Microcircuits within Superficial Layers of the Medial Entorhinal Cortex.” Cell Rep. 19 (6): 1110–16. https://doi.org/10.1016/j.celrep.2017.04.041.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      The manuscript by Agha et al. provides a fundamental understanding regarding the participation of V2a interneurons in generating and patterning the locomotor rhythm. The authors provide convincing and solid evidence regarding the heterogeneity of V2a neurons in their intrinsic and synaptic properties and how these shape their outputs. The manuscript could be much improved by the inclusion of statistical analysis of some of the key data currently presented qualitatively. 

      We are extremely grateful for the positive and thorough comments provided by the three reviewers and have now had the opportunity to address all their concerns, as detailed below in our point-by-point response. Specifically, we have provided statistical analysis and major revisions to the text to help with rigor, clarity and interpretation, and we have also include new perturbation experiments that provide a more definitive test of one of our predictions – namely that reciprocal inhibition plays speed-specific roles in rhythm generation and pattern formation. The revisions greatly improve the manuscript and help bolster our conclusions.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary:

      In this very interesting study, Agha and colleagues show that two types of Chx10-positive neurons (V2a neurons) have different anatomical and electrophysiological properties and receive distinct patterns of excitatory and inhibitory inputs as a function of speed during fictive swimming in the larval zebrafish. Using single-cell fills they show that one cell type has a descending axon ("descending V2as"), while the other cell type has both a descending axon and an ascending axon ("bifurcating V2as"). In the Chx10:GFP line, descending V2as display strong GFP labeling, while bifurcating V2as display weak GFP labeling. The bifurcating V2as are located more laterally in the spinal cord. These two cell types have different electrophysiological properties as revealed by patch-clamp recordings. Positive current steps indicated that descending V2as comprise tonic spiking or bursting neurons. Bifurcating V2as comprise chattering or bursting neurons. The two types of V2a neurons display different recruitment patterns as a function of speed. Descending tonic and bifurcating chattering neurons are recruited at the beginning of the swimming bout, at fast speeds (swimming frequency above 30 Hz). Descending bursting neurons were preferentially recruited at the end of swimming bouts, at low speeds (swimming frequency below 30 Hz), while bifurcating bursting neurons were recruited for a broader swimming frequency range. The two types of V2a neurons receive distinct patterns of excitatory and inhibitory inputs during fictive locomotion. In descending V2as, when speed increases: i) excitatory conductances increase in fast neurons and decrease in slow neurons; ii) inhibitory conductances increase in fast neurons and increase in slow neurons. In bifurcating V2as, when speed increases: i) excitatory conductances increase in fast neurons but do not change in slow neurons; ii) inhibitory conductances increase in fast neurons and do not change in slow neurons. The timing of excitatory and inhibitory inputs was then studied. In descending V2as, fast neurons receive excitatory and inhibitory inputs that are in anti-phase with low contrast in amplitude and are both broadly distributed over the phase. The slow neurons receive two peaks of inhibition, one in anti-phase with the excitatory inputs and another just after the excitation. In bifurcating V2as, fast neurons receive two peaks of inhibition, while slow ones receive anti-phase inhibition. 

      Strengths: 

      This study focuses on the diversity of V2a neurons in zebrafish, an interesting cell population playing important roles in locomotor control and beyond, from fish to mammals. The authors provide compelling evidence that two subtypes of V2as show distinct anatomical, electrophysiological, and speed-dependent spiking activity, and receive distinct synaptic inputs as a function of speed. This opens the door to future investigation of the inputs and outputs of these neurons. Finding ways to activate or inhibit specifically these cells would be very helpful in the years to come. 

      Weaknesses: 

      No major weakness was detected. The experiments were carefully done, and the data were of high quality. 

      We really appreciate the positive assessment and have addressed minor issues below.

      Reviewer #2 (Public Review): 

      Summary: 

      Animals exhibit different speeds of locomotion. In vertebrates, this is thought to be implemented by different groups of spinal interneurons and motor neurons. A fundamental assumption in the field has been that neural mechanisms that generate and sustain the rhythm at different locomotor speeds are the same. In this study, the authors challenge this view. Using rigorous in vivo electrophysiology during fictive locomotion combined with genetics, the authors provide a detailed analysis of cellular and synaptic properties of different subtypes of spinal V2a neurons that play a crucial role in rhythm generation. Importantly, they are able to show that speed-related subsets of V2a neurons have distinct cellular and synaptic properties and may utilize different mechanisms to implement different locomotor speeds. 

      Strengths: 

      The authors fully utilize the zebrafish model system and solid electrophysiological analyses to study the active and passive properties of speed-related V2a subsets. Identification of the V2a subtype is based directly on their recruitment at different locomotor speeds and not on indirect markers like soma size, D-V position etc. Throughout the article, the authors have cleverly used standard electrophysiological tests and analysis to tease out different neuronal properties and link it to natural activity. For example, in Figures 2 and 4, the authors make comparisons of V2a spiking with current steps and during fictive swims showing spike rates measured with current steps are physiologically relevant and observed during natural recruitment. The experiments done are rigorous and well-controlled.

      Weaknesses: 

      The authors claim that a primary result of their study is that reciprocal inhibition is important for rhythmogenesis at fast speeds while recurrent inhibition is key at slow speeds. This is shown in Figure 6, however, the authors do not show any statistical tests for this claim. The authors also do not show any conclusive evidence that reciprocal inhibition is required for rhythmogenesis at fast speeds and vice versa for slow speeds. Additional experiments or modeling studies that conclusively show the necessity of these different inhibitory sources to the generation of different rhythms would be needed to strengthen this claim. 

      We have added new loss-of-function experiments as requested to strengthen the claim that reciprocal inhibition is critical for rhythmogenesis at fast speeds, but dispensable at slow. Specifically, we use botulinum toxin selectively expressed in Dmrt3-labeled dI6 interneurons, which play a role in reciprocal inhibition at a variety of speeds (new Figure 7). These experiments demonstrate a selective impact on rhythmic burst generation and alternation during periods of swimming where the highest frequency motor activity occurs. During lower frequency activity, rhythm generation is preserved, however motor output is selectively altered, consistent with the idea that reciprocal inhibition plays an important role in patterning at slow speeds.

      The authors do a great job of teasing out cellular and synaptic properties in the different V2a subsets, however, it is not clear if or how these match the final output. For example, V2aD neurons are tonic or bursting for fast and slow speeds respectively but it is not intuitive how these cellular properties would influence phasic excitation and inhibition these neurons receive. 

      This question gets at the heart of what we are trying to illustrate in Figure 6. Specifically, in the new Figure 6E,F we have aligned the cumulative distribution of spikes recorded in cell-attached mode with phasic excitatory and inhibitory currents to reveal how well cellular properties versus patterns of synaptic drive match the final output (spikes). Our expectation was if intrinsic cellular properties where ultimately generating phasic spiking patterns, then patterns of excitatory and inhibitory drive need not be phasic. Instead, we see that synaptic drive is phasic with spiking occurring between peaks in excitation and troughs in inhibition.  Since post-synaptic cellular properties should not impact the pre-synaptic excitation they receive, this suggests that phasic spiking in all V2a neurons regardless of the capacity for cellular rhythmogenesis is a result of phasic input. In response to this concern, we have elaborated our discussion of what cellular properties may contribute and the impact on output in the Discussion (L502-511). 

      It is not clear from the discussion why having different mechanisms of rhythm generation at different speeds could be an important circuit design. The authors use anguilliform and carangiform modes of swimming to denote fast and slow speeds but there are differences in these movements other than speed, like rostrocaudal coordination. The frequency and pattern of these movements are linked and warrant more discussion. 

      We appreciate the opportunity to elaborate on this point more in the Discussion. In particular, we have added more text to clarify differences in movement related to both pattern-formation and rhythm-generation (L373-398) and to also suggest potential reasons for differences in mechanisms of rhythm generation (L478-488).  

      Reviewer #3 (Public Review):

      The manuscript by Agha et al. explores mechanisms of rhythmicity in V2a neurons in larval zebrafish. Two subpopulations of V2a neurons are distinguishable by anatomy, connectivity, level of GFP, and speed-dependent recruitment properties consistent with V2a neurons involved in rhythm generation and pattern formation. The descending neurons proposed to be consistent with rhythm-generating neurons are active during either slow or fast locomotion, and their firing frequencies during current steps are well matched with the swim frequency they firing during. The bifurcating (patterning neurons) are active during a broader swim frequency range unrelated to their firing during current steps. All of the V2a neurons receive strong inhibitory input but the phasing of this input is based on neuronal type and swim speed when the neuron is active, with prominent in-phase inhibition in slow descending V2a neurons and bifurcating V2a neurons active during fast swimming. Antiphase inhibition is observed in all V2a neurons but it is the main source of rhythmic inhibition in fast descending V2a neurons and bifurcating neurons active during slow swimming. The authors suggest that properties supporting rhythmic bursting are not directly related to locomotor speed but rather to functional neuronal subtypes. 

      This is a well-written paper with many strengths including the rigorous approach. Many parameters, including projection pattern, intracellular properties, inhibition received, and activity during slow/fast swimming were obtained from the same neuron. This links up very well with prior data from the lab on cell position, birth order, morphology/projections, and control of MN recruitment to provide a comprehensive overview of the functioning of V2a interneuronal populations in the larval zebrafish. The overall conclusions are well supported by the data. Weaknesses are relatively minor and were largely related to terminology for some of the secondary conclusions. 

      (1) The assumption is made that all in-phase inhibition is recurrent and out-of-phase inhibition is reciprocal. The latter is likely true but the definition of recurrent may be a bit loose as could be multisegmental feed-forward inhibition as well. 

      This is an excellent point, which was also raised by Reviewer 1. We have now added references that justify this assertion (L281-283). We also add a new figure with schematics (Figure 8) to make it clearer how we are defining sources of recurrent versus reciprocal inhibition, as based on the anatomical constraints of the circuit. We agree that multi-segmental inputs could contribute to inhibition, but they will likely be more broadly distributed based on rostro-caudal location and contribute to tonic sources of drive.  We now clarify this (L285-286).

      (2). In a few places, it is mentioned that the properties of the V2a-D neurons are consistent with pacemakers. This could be true of both the V2a-D and -B neurons that burst in response to depolarizing steps but the properties of the remaining (fast) V2a-D neurons do not seem to be consistent with pacemakers, based on the properties shown. Tonic firing at a frequency related to the locomotor speed the neuron is active during and strong antiphase inhibition may instead suggest a stronger network component driving the rhythmicity. 

      We have been purposefully agnostic regarding the relative contribution of pacemaking to rhythm generation in the paper. Our measurements of bursting overlap with swim frequencies only in the V2a-D subtype. Similarly, the spike rates of V2a-D neurons alone overlap with their swim frequencies (Fig 2D,G,I). Since both respond to tonic input (current injection) by spiking in a pattern that resembles their natural spiking behavior, we have treated these cellular properties both as pacemaking. Although the bursting behavior is more consistent with what is normally considered pacemaking in rhythmic motor circuits, in the basal ganglia field tonic firing of dopaminergic neurons in the substantia nigra is referred to as pacemaking. Since the tonic firing pattern overlaps with swimming frequency in the same way the bursting pattern does, we are less inclined to discount its possible contribution to rhythmogenesis based on the fact they do not burst. We have made modifications to the document to make this point clearer (L409-416).  Regardless, our data argue that pacemaking is unlikely to be a major contributor to phasic firing in V2a neurons, at least at midbody, so we agree with you on this last point.

      Reviewer #1 (Recommendations For The Authors): 

      I only have very minor suggestions. 

      (1) It would be useful to add a table or a figure summarizing the main results (integration of anatomy, electrophysiological properties, synaptic inputs, firing, swimming speed). 

      We agree and have added a figure panel summarizing the main results (new Figure 8).

      (2) Some statistics to possibly add (only suggestions): Do bifurcating V2as display significantly weaker GFP labeling than descending V2as? Do descending V2as have a significantly smaller soma size? Do descending V2as have a significantly lower rheobase and significantly higher resistance? Are tonic descending neurons and chattering bifurcating neurons located significantly more dorsally than the bursting descending and bifurcating neurons? Is there a way to show that bifurcating bursting neurons are recruited statistically on a broader swimming frequency range than other cell types (e.g. SD, coefficient of variation, cumulative distribution function with Kolmogorov-Smirnov test)? 

      For the first question, in all cases when we targeted more dimly labeled neurons they were bifurcating. We now clarify this in the text (L119, L129-132). However, this is difficult to quantify, since absolute levels of fluorescence will vary from preparation to preparation based on the dissection and intensity of epifluorescence illumination. In addition, we did not always take images prior to recording and levels of GFP after recording will vary depending on relative state of dialysis. So, unfortunately, we cannot provide a rigorous statistical analysis beyond the qualitative statement we provide.

      For the remainder of the questions, we now provide statistical analysis for soma size, position, rheobase, and resistance for the data in Figure 2.  Please note, we have reported all our statistical analyses in the figure legends. We also provide analysis of the density distributions of swimming frequencies for slow bursting bifurcating neurons and slow bursting descending neurons as requested, which are significantly different following a K-S test (L162).

      (3) Some details to possibly add (only suggestions): proportion of neurons in which single cell fills were done/checked anatomically? Proportions of bursting/chattering/tonic/bursting neurons? In Figure 1, maybe define visually bifurcating vs descending neurons. In Figure 2I, the recruitment of bifurcating chattering neurons is not plotted. Is that normal? Figures 6D, E, maybe specify more clearly which neurons are the fast and slow ones. In Figure 3C, the X-axis name is missing. 

      For the first question, the proportion is 100%, since the morphology of all neurons was confirmed post recording, which we now clarify in the Methods section (L573). For the second question, the numbers of bursting/chattering/tonic/bursting neurons are now reported in legend of Figure 2, in addition to the total number of V2a-D and V2a-B types, so it is clear what proportion of the recording population this represents. For the third question, in Figure 1 we cannot define V2a neurons as bifurcating or descending yet, this was only possible to confirm after the recording (Figure 2), and was done for every neuron (as mentioned above). For the fourth question, for Figure 2I the chattering response was too variable to be meaningful in terms of averaging and plotting, which we now mention in the text (L169-171). The standard deviations are ridiculous. For the fifth question, we have modified Figures 6D, E to more clearly label fast and slow V2a neurons. Finally, we have included the X-axis label in Figure 3C, thank you!

      (4) Some text to possibly modulate (only suggestions): 

      A possible role for these V2a subtypes in the rhythm generation and pattern formation layer is an interesting idea but this may not be completely solved by the present experiments. Maybe the authors could suggest future experiments in the discussion that would establish how to tackle this important question (double bursts, deletions, etc...)? 

      We appreciate the opportunity to raise future experiments that could help further tease apart their contribution to rhythm and pattern and have now added potential experiments to the Discussion (L498-501; L527-529), which include more precise molecular identification, spatial perturbation, and computational modeling.

      It would be nice to cite the references in which the rhythm/pattern CPG concept was proposed initially (lines 49-50 and elsewhere, Cf. Perret and Cabelguen 1980 Brain Res; Perret et al. 1989 Stance and Motion, Plenum Press; McCrea et al. 2006 J Physiol). 

      Apologies for our poor scholarship here, we now credit the appropriate primary research articles (L50-51).

      In the abstract, it would be useful to say clearly which cells are descending vs. bifurcating ones. Same thing in the result section, maybe it would be nice to identify the two populations long before line 127. 

      We have modified the abstract and introduction sections accordingly. We also note that the two populations are defined in the first paragraph of the results (L90).

      About the possible mechanism of rhythm generation, it is mentioned in line 54 that a single mechanism was proposed to exist, but the authors also mention in lines 122-123 that several mechanisms were proposed for rhythm generation... Maybe adjust the introduction? 

      As requested, we have clarified our meaning in the introduction (L55-58). Several mechanisms exist, but the likelihood that different mechanisms operate at different speeds has not been considered.  Either cellular properties are tuned to different speeds (i.e., bursting is faster in neurons recruited at faster speeds) or network properties can explain different speeds (i.e., different frequencies and patterns emerge from the connectivity).

      About the convention that in fish in-phase currents originate from the ipsilateral and out-of-phase currents originate from the contralateral side (lines 271-275), is there any reference for this assumption? 

      Yes, we now provide references (L281-283).

      Lines 338-345 stating that reciprocal inhibition is important for rhythm generation as predicted by the half-center model can sound surprising to some authors considering that many studies showed that inhibition is not needed for rhythm generation, including lamprey hemicords stimulated electrically (Cangiano and Grillner 2003 J Neurophysiol; 2005 J Neurosci, Cangiano et al. 2012 Neuroscience), salamander hemicords or hemisegments stimulated chemically (Ryczko et al. 2010, 2015 J Neurophysiol), or rhythm activity evoked on each side of the cord using optogenetic stimulation of glutamatergic neurons (Hägglund et al. 2013 PNAS) etc. To demonstrate the importance of inhibition in rhythmogenesis, one would need to activate and/or deactivate the ipsilateral versus contralateral inhibitory neurons. It would be nice to maybe add citations to such studies if available in the zebrafish literature. Overall I would simply suggest modulating this section to be a bit more balanced conceptually. 

      We have included the above referenced studies for lampreys and added ones for tadpoles (L464-468), to stick with undulatory swimmers. We had focused on experiments with the most selective perturbations in the interests of space, but appreciate the opportunity to present both arguments. We also include new loss-of-function experiments that impact one spinal population linked to reciprocal inhibition (Dmrt3-labeled dI6 interneurons), which demonstrate a speed-specific impact on rhythmogenesis (L323-371; new Figure 7) and compare our findings to a recent study in the zebrafish literature examining the impact of spinal Dmrt3-ablations on axial rhythmogenesis (L426-433).

      Line 676 "episodies". 

      Thanks, corrected.

      Reviewer #2 (Recommendations For The Authors): 

      The authors make a claim that recurrent and reciprocal inhibition play key roles in rhythmogenesis at different speeds. This is not conclusively shown. Rayleigh's z-test can be used to test the significance of the directionality of circular data. Including more data from experiments or computational models to show the necessity of reciprocal or recurrent inhibition for timed spiking of V2a neurons would address this. 

      We have now modified Figure 6 so we can directly compare differences in reciprocal and recurrent inhibition between V2a types. We now report statistical analysis in the figure legends using a Watson’s Two Test for Homogeneity to test differences in the circular data. As mentioned above, we have also added new loss-of-function experiments as requested to strengthen the claim that reciprocal inhibition is critical for rhythmogenesis at fast speeds, but dispensable at slow. Specifically, we use botulinum toxin selectively expressed in Dmrt3-labeled dI6 interneurons, which play a role in reciprocal inhibition at a variety of speeds (new Figure 7). These experiments demonstrate a selective impact on rhythmic burst generation and alternation during periods of swimming where the highest frequency motor activity occurs. During lower frequency activity, rhythm generation is preserved, however motor output is selectively altered, consistent with the idea that reciprocal inhibition plays an important role in patterning at slow speeds.

      In Figure 4D, the authors show that V2a neurons, both subtypes, spike in advance of the center of the motor burst. Recent studies (Jay et al., 2023) have shown differences in the timing of V2aD and V2aB neurons. Are there differences in the methods or selection of cells that would reflect differences in results? 

      This is a great point and we appreciate the opportunity to reconcile our observations here with those in Jay et al., 2023. In the Jay et al paper, we used drifting visual stimuli to evoke fictive swimming.  These experiments allow you to uncouple rhythm generation (forward propulsion) and pattern formation (lateral direction). Notably, fictive swim frequencies during so called optomotor responses are below 35Hz, meaning that we are sampling exclusively from V2a neurons recruited during carangiform swim mode. In these experiments, slow V2a-D neurons fire well in advance of slow V2a-B neurons, compared to what we see here which is relatively synchronous. Critically, however, the phase-advanced firing pattern revealed in the Jay et al paper for V2a-D neurons aligns with the phase-advanced excitatory input reported here.  In addition, the recruitment probabilities of slow V2a-D neurons are higher in the Jay et al paper than what we report here. Collectively these observations suggest either more effective excitation during optomotor responses (Jay et al) or more potent inhibition during escape responses (Agha et al). Ultimately, differences in the relative synchrony of firing among slow V2a-D and slow V2a-B neurons appears to depend on the nature of the stimulus and range of swim frequencies, where in one case frequency and amplitude modulation are coupled over a broad range of frequencies (somatosensory stimuli delivered here), while in the other case frequency and amplitude modulation are uncoupled over a narrow range of frequencies (visual stimuli in Jay at al). We now elaborate on this point in the Discussion (L485-498).

      Given the conserved nature of spinal circuits across vertebrates, it is also important to discuss these findings in the context of limbed animals. In tetrapods, changes in locomotor speed also involve pattern/gait changes, however, it is not known if or how these changes in frequency and pattern are linked. This study, by suggesting that different speeds are implemented not only by different neurons but possibly by different neuronal mechanisms, provides important cues for the missing link and would strengthen the discussion. 

      We agree and have made substantial edits to the beginning Discussion to provide better context for the impact of our work (L373-398).

      Minor points: 

      Line 122: of needs to be replaced by or. 

      Corrected, thanks!

      Figure 3B Top panel: What is the grey bar? 

      This has been removed for clarity.

      Figure 3B bottom panel is not referenced in the main text at all. 

      Now referenced (L187, L189)

      Line 260: 2nd inhibition needs to be replaced with excitation. 

      Done, thanks!

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments: 

      - Figure 2 panel ordering is visually appealing but tough to follow. 

      We apologize and tried reconfigurations, but they just looked too kludgy.  Hoping for a pass on this one.

      - Lines 164-166 and 319-327 (related to comment 2 above): For the fast/tonic V2a-Ds, it is not clear that this is intrinsic and it is not consistent with pacemaker properties. This could also be (and likely is) synaptically/network-driven rhythmicity, although the firing frequencies match up well with the swim frequencies. 

      Fast/tonic V2a-Ds were tested with somatic current injection as with all other neurons, which we assume primarily reflects intrinsic cellular properties. The spike rates we observe in fast/tonic V2a-Ds overlap with spike rates observed during fictive swimming, so they are positioned as well as bursting neurons to contribute to pacemaking. We also elaborate on this point in response to Major Comment #2.

      - Lines 189-192: The patterning neurons receive excitatory drive before rhythm-generating neurons. The time constant explanation makes sense for why two neurons with a common drive would fire at different times but this does not support the proposed hierarchical arrangement or being consistent with V2a-Bs being downstream as mentioned in lines 49-56 and 218-219. 

      In response to this point, we have modified Figure 6 so we can directly compare the timing of presynaptic excitatory inputs between the types. Here it can be seen clearly that phasic excitatory inputs to both fast and slow V2a-Ds are phase-advanced relative to fast and slow V2a-Ds (Figure 6B,C). As the reviewer mentions, it is likely a combination of time constants and the relative balance of excitation and inhibition that ultimately lead to synchronous spiking despite differences in the timing of inputs.

      - Lines 338-339: It is not shown that the rhythm relies on inhibition during slow. 

      This line has been removed in the revision process.

      - Consistent with the importance of reciprocal (contralateral) inhibition in fast locomotion here, rodent fictive locomotion is slower in hemisect than in the full cord. However, the Rybak and O'Donovan groups suggest that this is due to loss of drive to ipsilateral inhibitory neurons by excitatory contralateral projections, rather than contralateral inhibitory interneurons (see Falgairolle and O'Donovan 2019, 2021, and Shevtsova et al 2022). 

      This is an interesting point that highlights how we are defining reciprocal versus recurrent inhibition. In this example, although ipsilaterally-projecting interneurons are responsible for inhibition, since they are excited by commissurally-projecting excitatory interneurons, we would classify this as feedforward (reciprocal) not feedback (recurrent) inhibition. So reciprocal (feedforward) inhibition is still important to get higher frequency rhythms, it is di-synaptic in this case. We have added a new figure (Figure 8) to clarify what we mean by reciprocal (feedforward) and recurrent (feedback) based on the ipsilateral projection patterns of V2a neurons, and point out the definitions would be flipped for excitatory interneurons in the Discussion (L452-455).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      While there are many models for sequence retrieval, it has been difficult to find models that vary the speed of sequence retrieval dynamically via simple external inputs. While recent works [1,2] have proposed some mechanisms, the authors here propose a different one based on heterogeneous plasticity rules. Temporally symmetric plasticity kernels (that do not distinguish between the order of pre and post spikes, but only their time difference) are expected to give rise to attractor states, asymmetric ones to sequence transitions. The authors incorporate a rate-based, discrete-time analog of these spike-based plasticity rules to learn the connections between neurons (leading to connections similar to Hopfield networks for attractors and sequences). They use either a parametric combination of symmetric and asymmetric learning rules for connections into each neuron, or separate subpopulations having only symmetric or asymmetric learning rules on incoming connections. They find that the latter is conducive to enabling external inputs to control the speed of sequence retrieval.

      Strengths:

      The authors have expertly characterised the system dynamics using both simulations and theory. How the speed and quality of retrieval varies across phases space has been well-studied. The authors are also able to vary the external inputs to reproduce a preparatory followed by an execution phase of sequence retrieval as seen experimentally in motor control. They also propose a simple reinforcement learning scheme for learning to map the two external inputs to the desired retrieval speed.

      Weaknesses:

      (1) The authors translate spike-based synaptic plasticity rules to a way to learn/set connections for rate units operating in discrete time, similar to their earlier work in [5]. The bio-plausibility issues of learning in [5] carry over here, for e.g. the authors ignore any input due to the recurrent connectivity during learning and effectively fix the pre and post rates to the desired ones. While the learning itself is not fully bio-plausible, it does lend itself to writing the final connectivity matrix in a manner that is easier to analyze theoretically.

      We agree with the reviewer that learning is not `fully bio-plausible’. However, we believe that extending the results to a model in which synaptic plasticity depends on recurrent inputs is beyond the scope of this work. We have added a mention of this issue in the Discussion in the revised manuscript.

      (2) While the authors learn to map the set of two external input strengths to speed of retrieval, they still hand-wire one external input to the subpopulation of neurons with temporally symmetric plasticity and the other external input to the other subpopulation with temporally asymmetric plasticity. The authors suggest that these subpopulations might arise due to differences in the parameters of Ca dynamics as in their earlier work [29]. How these two external inputs would connect to neurons differentially based on the plasticity kernel / Ca dynamics parameters of the recurrent connections is still an open question which the authors have not touched upon.

      The issue of how external inputs could self-organize to drive the network to retrieve sequences at appropriate speeds is addressed in the Results section, paragraph `Reward-driven learning’. These inputs are not `hand-wired’ - they are initially random and then acquire the necessary strengths to allow the network to retrieve the sequences at different speeds thanks to a simple reinforcement learning scheme. We have rewritten this section to clarify this issue.

      (3) The authors require that temporally symmetric and asymmetric learning rules be present in the recurrent connections between subpopulations of neurons in the same brain region, i.e. some neurons in the same brain region should have temporally symmetric kernels, while others should have temporally asymmetric ones. The evidence for this seems thin. Though, in the discussion, the authors clarify 'While this heterogeneity has been found so far across structures or across different regions in the same structure, this heterogeneity could also be present within local networks, as current experimental methods for probing plasticity only have access to a single delay between pre and post-synaptic spikes in each recorded neuron, and would therefore miss this heterogeneity'.

      We agree with the reviewer that this is currently an open question. We describe this issue in more detail in the Discussion of the revised manuscript.

      (4) An aspect which the authors have not connected to is one of the author's earlier work:

      Brunel, N. (2016). Is cortical connectivity optimized for storing information? Nature Neuroscience, 19(5), 749-755. https://doi.org/10.1038/nn.4286 which suggests that the experimentally observed over-representation of symmetric synapses suggests that cortical networks are optimized for attractors rather than sequences.

      We thank the reviewer for this suggestion. We have added a paragraph in the discussion that discusses work on statistics of synaptic connectivity in optimal networks. We expect that in networks that contain two subpopulations of neurons, the degree of symmetry should be intermediate between a network storing fixed point attractors exclusively, and a network storing sequences exclusively.

      Despite the above weaknesses, the work is a solid advance in proposing an alternate model for modulating speed of sequence retrieval and extends the use of well-established theoretical tools. This work is expected to spawn further works like extending to a spiking neural network with Dale's law, more realistic learning taking into account recurrent connections during learning, and experimental follow-ups. Thus, I expect this to be an important contribution to the field.

      We thank the reviewer for the insightful comments.

      Reviewer #2 (Public Review):

      Sequences of neural activity underlie most of our behavior. And as experience suggests we are (in most cases) able to flexibly change the speed for our learned behavior which essentially means that brains are able to change the speed at which the sequence is retrieved from the memory. The authors here propose a mechanism by which networks in the brain can learn a sequence of spike patterns and retrieve them at variable speed. At a conceptual level I think the authors have a very nice idea: use of symmetric and asymmetric learning rules to learn the sequences and then use different inputs to neurons with symmetric or asymmetric plasticity to control the retrieval speed. The authors have demonstrated the feasibility of the idea in a rather idealized network model. I think it is important that the idea is demonstrated in more biologically plausible settings (e.g. spiking neurons, a network with exc. and inh. neurons with ongoing activity).

      Summary

      In this manuscript authors have addressed the problem of learning and retrieval sequential activity in neuronal networks. In particular, they have focussed on the problem of how sequence retrieval speed can be controlled?

      They have considered a model with excitatory rate-based neurons. Authors show that when sequences are learned with both temporally symmetric and asymmetric Hebbian plasticity, by modulating the external inputs to the network the sequence retrieval speed can be modulated. With the two types of Hebbian plasticity in the network, sequence learning essentially means that the network has both feedforward and recurrent connections related to the sequence. By giving different amounts of input to the feed-forward and recurrent components of the sequence, authors are able to adjust the speed.

      Strengths

      - Authors solve the problem of sequence retrieval speed control by learning the sequence in both feedforward and recurrent connectivity within a network. It is a very interesting idea for two main reasons: 1. It does not rely on delays or short-term dynamics in neurons/synapses 2. It does not require that the animal is presented with the same sequences multiple times at different speeds. Different inputs to the feedforward and recurrent populations are sufficient to alter the speed. However, the work leaves several issues unaddressed as explained below.

      Weaknesses

      - The main weakness of the paper is that it is mostly driven by a motivation to find a computational solution to the problem of sequence retrieval speed. In most cases they have not provided any arguments about the biological plausibility of the solution they have proposed e.g.:

      - Is there any experimental evidence that some neurons in the network have symmetric Hebbian plasticity and some temporally asymmetric? In the references authors have cited some references to support this. But usually the switch between temporally symmetric and asymmetric rules is dependent on spike patterns used for pairing (e.g. bursts vs single spikes). In the context of this manuscript, it would mean that in the same pattern, some neurons burst and some don't and this is the same for all the patterns in the sequence. As far as I see here authors have assumed a binary pattern of activity which is the same for all neurons that participate in the pattern.

      There is currently only weak evidence for heterogeneity of synaptic plasticity rules within a single network, though there is plenty of evidence for such a heterogeneity across networks or across locations within a particular structure (see references in our Discussion). The reviewer suggests another interesting possibility, that the temporal asymmetry could depend on the firing pattern on the post-synaptic neuron. An example of such a behavior can be found in a paper by Wittenberg and Wang in 2006, where they show that pairing single spikes of pre and post-synaptic neurons lead to LTD at all time differences in a symmetric fashion, while pairing a pre-synaptic spike with a burst of post-synaptic spikes lead to temporally asymmetric plasticity, with a LTP window at short positive time differences. We now mention this possibility in the Discussion, but we believe exploring fully this scenario is beyond the scope of the paper.

      - How would external inputs know that they are impinging on a symmetric or asymmetric neuron? Authors have proposed a mechanism to learn these inputs. But that makes the sequence learning problem a two stage problem -- first an animal has to learn the sequence and then it has to learn to modulate the speed of retrieval. It should be possible to find experimental evidence to support this?

      Our model does not assume that the two processes necessarily occur one after the other. Importantly, once the correct external inputs that can modulate sequence retrieval are learned, sequence retrieval modulation will automatically generalize to arbitrary new sequences that are learned by the network.

      - Authors have only considered homogeneous DC input for sequence retrieval. This kind of input is highly unnatural. It would be more plausible if the authors considered fluctuating input which is different from each neuron.

      We have modified Figure 1e and Figure 2c to show the effects of fluctuating inputs on pattern correlations and single unit activity. We find that these inputs do not qualitatively affect our results.

      - All the work is demonstrated using a firing rate based model of only excitatory neurons. I think it is important that some of the key results are demonstrated in a network of both excitatory and inhibitory spiking neurons. As the authors very well know it is not always trivial to extend rate-based models to spiking neurons.

      I think at a conceptual level authors have a very nice idea but it needs to be demonstrated in a more biologically plausible setting (and by that I do not mean biophysical neurons etc.).

      We have included a new section in the discussion with an associated figure (Figure 7) demonstrating that flexible speed control can be achieved in an excitatory-inhibitory (E-I) spiking network containing two excitatory populations with distinct plasticity mechanisms.

      Reviewer #1 (Recommendations For The Authors):

      In the introduction, the authors state: 'symmetric kernels, in which coincident activity leads to strengthening regardless of the order of pre and post-synaptic spikes, have also been observed in multiple contexts with high frequency plasticity induction protocols in cortex [21]'. To my understanding, [21]'s final model 3, ignores LTD if the post-spike also participates in LTP, and only considers nearest-neighbour interactions. Thus, the kernel would not be symmetric. Can the authors clarify what they mean and how their conclusion follows, as [21] does not show any kernels either.

      In this statement, we were not referring to the model in [21], but rather the experimentally observed plasticity kernels at different frequencies. In particular, we were referring to the symmetric kernel that appears in the bottom panel of Figure 7c in that paper.

      The authors should also address the weaknesses mentioned above. They don't need to solve the issues but expand (and maybe indicate resolutions) on these issues in the Discussion.

      For ease of reproducibility, the authors should make their code available as well.

      We intend to publish the code required to reproduce all figures on Github.

      Reviewer #2 (Recommendations For The Authors):

      -  Show the ground state of the network before and after learning.

      We have decided not to include such a figure, as we have not analyzed the learning process, but instead a network with a fixed connectivity matrix which is assumed to be the end result of a learning process.

      -  Authors have only considered a network of excitatory neurons. This does not make sense. I think they should demonstrate a network of both exc. and inch. neurons (spiking neurons) exhibiting ongoing activity.

      See our comment to Reviewer #2 in the previous section.

      -  Show how the sequence dynamics unfolds when we assume a non-zero ongoing activity.

      We are not sure what the reviewer means by `non-zero ongoing activity. We show now the dynamics of the network in the presence of noisy inputs, which can represent ongoing activity from other structures (see Fig 1e and 2c).

      -  From the correlation (==quality) alone it is difficult to judge how well the sequence has been recovered. Authors should consider showing some examples so that the reader can get a visual estimate of what 0.6 quality may mean. High speed is not really associated with high quality (Fig 2b). So it is important to show how the sequence retrieval quality is for non-linear and heterogeneous learning rules.

      We believe that some insight into the relationship between speed and quality for the case of non-linear and heterogeneous learning rules is addressed by the correlation plots for chosen input configurations (see Fig. 3a and and 5b). We leave a full characterization for future work.

      -  Authors should show how the retrieval and quality of sequences change when they are recovered with positive input, or positive input to one population and negative to another. In the current version sequence retrieval is shown only with negative inputs. This is a somewhat non-biological setting. The inhibitory gating argument (L367-389) is really weak.

      We would like to clarify that with the parameters chosen in this paper, the transfer function has half its maximal rate at zero input. This is due to the fact we chose the threshold to be zero, using the fact that any threshold can be absorbed in the external inputs. Thus, negative inputs really mean sub-threshold inputs, and they are consistent with sub-threshold external excitatory inputs. We have clarified this issue in the revised manuscript.

      -  Authors should demonstrate how the sequence retrieval dynamics is altered when they assume a fluctuating input current for sequence retrieval instead of a homogeneous DC input.

      See our comment to Reviewer #2 in the previous section.

      -  Authors should show what are the differences in synaptic weight distribution for the two types of learning (bi-linear and non-linear). I am curious to know if the difference in the speed in the two cases is related to the weight distribution. In general I think it is a good idea to show the synaptic weight distribution before and after learning.

      As mentioned above, we do not study any learning process, but rather a network with a fixed connectivity matrix, assumed to represent the end result of learning. In this network, the distribution of synaptic weights converges to a Gaussian in the large p and cN limits, independently of the functions f and g, because of the central limit theorem, if there are no sign constraints on weights. In the presence of sign constraints, the distribution is a truncated Gaussian.

      -  I suggest the use of a monochromatic color scale for figure 2b and 3b.

      Figure 3: The sentence describing panel 2 seems incomplete.

      Also explain why there is non-monotonic relationship between I_s and speed for some values of

      I_a in 3b

      There is a non-monotonic relationship for retrieval quality, not speed. We have clarified this in the manuscript text, but don’t currently have an explanation for why this phenomenon occurs for these specific values of I_a.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Additional Discussion Points

      (1) There is not much exploration of potential mechanisms, i.e., the impact of PV neuron activity on the broader circuit. Additionally, the study exclusively focuses on PV cells and does not explore the role of other prefrontal populations, particularly those known to respond to cueevoked fear states. The discussion should consider how PV activity might impact the broader circuit and whether the present findings are specific to PV cells or applicable to other interneuron subtypes.

      We have added an extensive discussion of potential mechanisms and the potential contributions of other interneuron subtypes:

      “For example, PV neurons aid in improving visual discrimination through sharpening response selectivity in visual cortex (Lee et al., 2012). In prefrontal cortex, PV neurons are critical for task performance, particularly during performance of tasks that require flexible behavior such as rule shift learning (Cho et al., 2020) and reward extinction (Sparta et al., 2014). Further, PV neurons play an essential role in the generation of cortical gamma rhythms, which contribute to synchronization of selective populations of pyramidal neurons (Sohal et al., 2009; Cardin et al., 2009). Courtin et al (2014) showed that brief suppression of dorsomedial prefrontal (dmPFC) PV neural activity enhanced fear expression, one of the main functions of the dmPFC, by synchronizing the spiking activity of dmPFC pyramidal neurons (Courtin et al., 2014). This result is potentially relevant to our findings, but likely involves different circuit mechanisms because of the difference in timescale, targeted area, and downstream projection targets (Vertes, 2004). These and other studies support the idea that PV neural activity supports the execution of a behavior by shaping rather than suppressing cortical activity, potentially by selecting among conflicting behaviors by the synchronization of different pyramidal populations (Warden et al., 2012; Lee et al., 2014).

      The roles of other inhibitory neural subtypes (such as somatostatin (SOM)-expressing and vasoactive intestinal peptide (VIP)-expressing IL GABA neurons) in avoidance behavior are currently unknown, but are likely important given the role of SOM neurons in gamma-band synchronization (Veit et al., 2017), and the role of VIP neurons in regulating PV and SOM neural activity (Cardin, 2018).” 

      (2) There is some discordance between changes in neural activity and behavior. For example, in Figure 4C, the relationship between PV neuron activity and movement emerges almost immediately during learning, but successful active avoidance emerges much more gradually. Why is this?

      We have added extensive text to the discussion that addresses this issue:

      “Interestingly, the rise in IL PV neural activity during movement does not require avoidance learning. IL PV neurons begin to respond during movement immediately after the animal has received a single shock in an environment, but learning to cross the chamber to avoid the signaled shock takes tens of trials. Why is there a discordance between the emergence of the IL PV signal during movement and avoidance learning?

      The components underlying active avoidance have been debated over the years, but are thought to involve at least two essential behaviors – suppressing freezing, and moving to safety (LeDoux et al., 2017). Freezing is the default response of mice upon hearing a shock-predicting tone, and can be learned in a single trial (Ledoux, 1996; Fanselow, 2010; Zambetti et al., 2022). When a predator is in the distance, freezing can increase the chance of survival by reducing the chances of detection. However, a strategic avoidance behavior may prevent a future encounter with the predator altogether. The importance of IL PV neural activity in defensive behavior may be to suppress reactive defensive behaviors such as freezing in order to permit a flexible goaldirected response to threat.

      The freezing suppression and avoidance movement components of the avoidance response are dissociable, both because freezing precedes avoidance learning, and because animals intermittently move prior to avoidance learning. Our finding that the rise in PV activity during movement emerges immediately after receiving a single shock, tens of trials before animals have learned the avoidance behavior, suggests that the IL PV signal is associated with the suppression of freezing. Further, IL PV neurons do not respond during movement toward cued rewards because in reward-based tasks there is no freezing response in conflict with reward approach behavior.” 

      (3) vmPFC was defined here as including the infralimbic (IL) and dorsal peduncular (DP) regions. While the role of IL has been frequently characterized for motivated behavior, relatively few studies have examined DP. Perhaps the authors are just being cautious, given the challenges involved in the viral targeting of the IL region without leakage to nearby regions such as DP. But since the optical fibers were positioned above the IL region, it is possible that DP did not contribute much to either the fiber photometry signals or the effects of the optogenetic manipulations. Perhaps DP should be completely omitted, which is more consistent with the definitions of vmPFC in the field.

      Yes, we included DP to be cautious as our viral expression sometimes leaks into DP, though the optic fiber targets IL. We have replaced vmPFC with IL throughout the manuscript. 

      (4) In the Discussion, the authors should consider why PV cells exhibit increased activity during both movement initiation and successful chamber crossing during avoidance. While the functional contribution of the PV signal during movement initiation was tested with optogenetic inhibition, some discussion on the possible role of the additional PV signal during chamber crossing is of interest readers who are intrigued by the signaling of two events. Is the chamber crossing signal related to successful avoidance or learned safety (e.g., see Sangha, Diehl, Bergstrom, Drew 2020)?

      IL PV neural activity starts to increase at movement initiation, peaks at chamber crossing (when movement speed is highest), and decreases after chamber crossing (Figure 1E). Thus, the increase in PV neural activity at movement initiation and at chamber crossing are different phases of the same event. 

      We think this signal is unlikely to be a safety signal, and have added text to the discussion to clarify this issue:

      “We think the IL PV signal is unlikely to be a safety signal (Sangha et al., 2020). First, the PV signal rises during movement not only in the avoidance context, but during any movement in a “threatening” context (i.e. a context where the animal has been shocked). For example, PV neural activity rises during movement during the intertrial interval in the avoidance task. Further, the emergence of the PV signal during movement happens quickly – after the first shock – and significantly before the animal has learned to move to the safe zone. This suggests a close association with enabling movement in a threatening environment, when animals must suppress a freezing response in order to move. Additionally, the rise in PV activity was specifically associated with movement and not with tone offset, the indicator of safety in this task. Finally, if IL PV neural activity reflects safety signals one would expect the response to be enhanced by learning, but the amplitude of the IL PV response was unaffected by learning after the first shock.”

      (5) The primary conclusion here that PV cells control the fear response should be considered within the context of prior findings by the Herry laboratory. Courtin et al (2014) demonstrated a select role of prefrontal PV cells in the regulation of fear states, accomplished through their control over prefrontal output to the basolateral amygdala. The observations in this paper, which used both ChR2 and Arch-T to address the impact of vmPFC PV activity on reactive behavior, are highly relevant to issues raised both in the Introduction and Discussion.

      Courtin et al (2014)’s finding is very important. We did not discuss this paper originally because Courtin et al. is about dmPFC, which has a different role in fear processing than IL/vmPFC. We have added text about this finding to the discussion:

      “Courtin et al (2014) showed that brief suppression of dorsomedial prefrontal (dmPFC) PV neural activity enhanced fear expression, one of the main functions of the dmPFC, by synchronizing the spiking activity of dmPFC pyramidal neurons (Courtin et al., 2014). This result is potentially relevant to our findings, but likely involves different circuit mechanisms because of the difference in timescale, targeted area, and downstream projection targets (Vertes, 2004).

      Additional analyses

      (1) As avoidance trials progress (particularly on days 2 and 3), do PFC PV responses attenuate? That is, does continued unreinforced tone presentations lead to reduced reliance of PV cellmediated suppression in order for successful avoidance to occur?

      We added Figure 1—Figure supplement 1M and 1N and a sentence on page 5: “IL PV neural activity during the avoidance movement was not attenuated by learning or repeated reinforcement (Figure 1—Figure supplement 1M and N, N = 8 mice, p = 0.8886, 1-way ANOVA).” We only included data from days 1 and 2, since we started to introduce short and long tone trials on day 3 which might interfere. 

      (2) In Figure 3D, it would be very informative and further support the claim of "no role for movement during reward" if the response of these cells during the "initiation of movement during reward-approach" was shown (similar to Figure 1F for threat avoidance).

      Thank you for the question. We added Figure 3—Figure supplement 1B and C to show IL PV neural activity aligned to initiation of movement during reward-approach. IL PV activity decreased after movement initiation for reward approach (N = 6 mice, p=0.0382, paired t-test). This further solidifies our claim that IL PV neuron activity only increases for threat avoidance.   

      Reviewer 1 (Recommendations For The Authors):

      (1) Fig1G shows the average response of PV cells during chamber crossing on an animal-toanimal basis. It would be informative to also see a similar plot for movement initiation.

      We have added the suggested figure in Figure 1—Figure supplement 1B.  

      (2) In the Results section (Page 5), there is a small issue with the logic. It says: "As vmPFC inactivation impairs avoidance behavior, the activity of inhibitory vmPFC PV neurons might be predicted to be low during successful avoidance trials." As opposed to "low", it should say "high", right? If inhibition impairs avoidance, then high responding by these cells would be presumed to drive the avoidance response, as supported by your findings.

      We have re-worded the text in this section. Based on prior findings that IL inactivation impairs avoidance (Moscarello et al., 2013), we predicted that inhibitory PV neurons would be less active during avoidance, because activating these neurons could suppress IL. However, we found that they were selectively active during avoidance.

      (3) In the caption/legend for Fig1E, it says that the "black ticks" indicate "tone onset". But it should say "movement initiation".

      We thank the reviewer for pointing out this error. The ticks do indicate tone onset, and we have corrected the figure to reflect this. 

      Reviewer 2 (Recommendations For The Authors):

      (4) Perhaps replace the term 'good outcomes' with 'reinforcing outcomes' or simply 'reinforcement'.

      Thank you for the suggestion. We have replaced ‘good outcomes’ with ‘reinforcing outcomes’.

      Reviewer 3 (Recommendations For The Authors):

      (5) It would be useful to provide some (perhaps speculative) explanation for the discordance between the PV activity-movement relationship and success of active avoidance in Fig. 4C

      We have added text to the discussion that addresses this issue:

      “Interestingly, the rise in IL PV neural activity during movement does not require avoidance learning. IL PV neurons begin to respond during movement immediately after the animal has received a single shock in an environment, but learning to cross the chamber to avoid the signaled shock takes tens of trials. Why is there a discordance between the emergence of the IL PV signal during movement and avoidance learning?

      The components underlying active avoidance have been debated over the years, but are thought to involve at least two essential behaviors – suppressing freezing, and moving to safety (LeDoux et al., 2017). Freezing is the default response of mice upon hearing a shock-predicting tone, and can be learned in a single trial (Ledoux, 1996; Fanselow, 2010; Zambetti et al., 2022). When a predator is in the distance, freezing can increase the chance of survival by reducing the chances of detection. However, a strategic avoidance behavior may prevent a future encounter with the predator altogether. The importance of IL PV neural activity in defensive behavior may be to suppress reactive defensive behaviors such as freezing in order to permit a flexible goaldirected response to threat.

      The freezing suppression and avoidance movement components of the avoidance response are dissociable, both because freezing precedes avoidance learning, and because animals intermittently move prior to avoidance learning. Our finding that the rise in PV activity during movement emerges immediately after receiving a single shock, tens of trials before animals have learned the avoidance behavior, suggests that the IL PV signal is associated with the suppression of freezing. Further, IL PV neurons do not respond during movement toward cued rewards because in reward-based tasks there is no freezing response in conflict with reward approach behavior.” 

      (6) I don't really understand what is shown in Figure 4D -- exactly what time points does this represent? Was habituation performed everyday?

      Figure 4D shows data from the approach task, not the avoidance task. This data is from welltrained mice, not the first day of training on this task. There was a pre-task recording period every day.

      (7) Why was optogenetic inhibition only delivered from 0.5-2.5 sec after the tone cue?

      We wanted to avoid any possibility that perception of the tone would be disrupted, so we delayed the onset of optogenetic inhibition. We chose 0.5 sec onset because animals typically begin to move ~1 second after tone onset.

      (8) The regression analysis with shuffled time points is not well explained -- some additional methodological details are needed (Fig. 2H).

      We added the following to the methods section to provide a clearer explanation: 

      “DF/F (t) was modeled as the linear combination of all event kernels. Given the event occurrence time points of all event types, we can use linear regression to decompose characteristic kernels for each event type. Kernel coefficients of the model were solved by minimizing the mean square errors between the model and the actual recorded signals. To prove that kernel ki is an essential component for the raw calcium dynamics, we compared the explanation power of the full model to the reduced model where the time points of the occurrence of event ki were randomly assigned. Thus, the kernel coefficients should not reflect the response to the event in the reduced model. 

      Editor's notes:

      -  Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the pvalue is less than 0.05.

      Thank you for pointing this out. We have included all the test statistics and exact p values as suggested.

      -  Please note the sex of the mice and distribution of sexes in each group for each experiment.

      We have added the sex of mice for all experiments in the methods section.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 2 (Public Review):

      Stress response in males versus females: The authors argue that the contextual control over behaviour was more robust in female rats as females show less within session variability and greater resistance to stress. What evidence is there that the restraint stress procedure caused a similar stress response in both sexes? That is, was the stress induction equally effective in males and females?

      The restraint protocol used in this study is a well-established stressor in rodents, known to produce robust behavioral and physiological effects (HPA axis activation), in both sexes. Although not measured in this study, the ACTH and cortisol responses are actually greater in females during restraint. To the extent that “stress induction” is interpreted as “HPA axis activation”, this strongly suggests that the stress induction in males and females was at least comparable, if not greater in females.

      We have added a few sentences (in the Result and Method section) to highlight this important point. We thank the reviewer for bringing this up.

      Minor corrections:<br /> (1) Please verify that the in-text reference to the figures is correct. I noticed a few mistakes, for example:

      - Line 120 (pdf) refers to Fig. 1 C-D but should refer to D only.

      - Line 312 (pdf) refers to Fig 1D for discrimination ratios but these are shown in Fig 1E

      - No reference in text to 2A

      Thank you for bringing this to our attention. We have fixed the in-text references to the figures.

      (2) In the results it states that the homecage c-Fos+ counts are shown in Figure 5 but I couldn't see these?

      The homecage c-Fos+ counts were initially shown as a pale gray band in the background of the main histograms. Because those counts are very low, it was hard to dissociate this gray band from the black horizontal axis. We have replaced the gray band with a more vivid blue line that is now in the foreground of the histograms. Moreover, we added a note in the figure legend to bring readers’ attention to this homecage count line, close to floor level. 

      (3) Line 306: It is stated that "the use of differential outcomes presumably allows animals to solve the task via simple (nonhierarchical) summation processes". I don't understand the use of "summation" here, isn't it simply that the rats are relying on direct context-outcome and/or cue-outcome associations?

      That’s right. These rats might be relying on direct context-outcome and cue-outcome associations and adding (or summing up) the converging expectations. We have added a few words in the text to clarify what we mean by summation (i.e. the addition of converging cue-evoked + context-evoked predictions).

    1. Author response:

      We thank the reviewers for their kind comments and advice. Like Reviewer 1, we acknowledge that while the exact involvement of Ih in allowing smooth transitions is likely not universal across all systems, our demonstration of the ways in which such currents can affect the dynamics of the response of complex rhythmic motor networks provides valuable insight. To address the concerns of Reviewer 2, we intend to include a sentence in the discussion to highlight the fact that cesium neither increased the pyloric frequency nor cause consistent depolarization in intracellular recordings. We will also highlight that these observations suggest both that cesium is not indirectly raising [K+]outside and support the conclusion that the effects of cesium are primarily through blockade of Ih rather than other potassium channels.

      Reviewer 3 raised some important points about modeling. While the lab has models that explore the effects of temperature on artificial triphasic rhythms, these models do not account for all the biophysical nuances of the full biological system. We have limited data about the exact nature of temperature-induced parameter changes and the extent to which these changes are mediated by intrinsic effects of temperature on protein structure versus protein interactions/modification by e.g. phosphorylation. With respects to the A current, we have seen in Tang et al., 2010 that the activation and inactivation rates are differentially temperature sensitive but do not have the data to suggest whether or not the time courses of such sensitivities are different as well. We intend to mention these facts in the paper, but plan to leave more comprehensive modeling as the purview of future works.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work successfully identified and validated TRLs in hepatic metastatic uveal melanoma, providing new horizons for enhanced immunotherapy. Uveal melanoma is a highly metastatic cancer that, unlike cutaneous melanoma, has a limited effect on immune checkpoint responses, and thus there is a lack of formal clinical treatment for metastatic UM. In this manuscript, the authors described the immune microenvironmental profile of hepatic metastatic uveal melanoma by sc-RNAseq, TCR-seq, and PDX models. Firstly, they identified and defined the phenotypes of tumor-reactive T lymphocytes (TRLs). Moreover, they validated the activity of TILs by in vivo PDX modelling as well as in vitro coculture of 3D tumorsphere cultures and autologous TILs. Additionally, the authors found that TRLs are mainly derived from depleted and late activated T cells, which recognize melanoma antigens and tumor-specific antigens. Most importantly, they identified TRLs associated phenotypes, which provide new avenues for targeting expanded T cells to improve cellular and immune checkpoint immunotherapy.

      Strengths:

      Jonas A. Nilsson, et al. has been working on new therapies for melanoma.  The team has also previously performed the most comprehensive genome-wide analysis of uveal melanoma available, presenting the latest insights into metastatic disease. In this work, the authors performed paired sc-RNAseq and TCR-seq on 14 patients with metastatic UM, which is the largest single-cell map of metastatic UM available. This provides huge data support for other  studies of metastatic UM.

      We thank the reviewer for these kind words about our work.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not  directly demonstrated. That is,  insufficient analyses are performed to fully support the key claims in the manuscript by the data presented. In particular:

      The author's description of the overall results of the article should be logical, not just a description of the observed phenomena. For example, the presentation related to the results of TRLs lacked logic. In addition, the title of the article emphasizes the three subtypes of hepatic metastatic UM  TRLs, but these three subtypes are not specifically discussed in the results as well as the discussion section. The title of the article is not a very comprehensive generalization and should be carefully considered by the authors.

      We thank the reviewer for the critical reading of our work. We have added more data and more discussion.

      The authors' claim that they are the first to use autologous TILs and sc-RNAseq to study immunotherapy needs to be supported by the corresponding literature to be more convincing. This can help the reader to understand the innovation and importance of the methodology.

      We have gone through the manuscript and found that we only refer to being first in using PDX models and autologous TILs to study immunotherapy responses by single-cell sequencing. While there are data to be deduced from other studies, we still believe this to be an accurate statement.

      In addition, the authors argue that TILs from metastatic UM can kill tumor cells. This is the key and bridging point to the main conclusion of the article. Therefore, the credibility of this conclusion should be considered.  Metastatic UM1 and UM9 remain responsive to autologous tumors under in vitro conditions with their autologous TILs.

      UM1 responds also in vivo in the subcutaneous model in the paper. We have also finished an experiment where we show that this model also responds in a liver metastasis model. These data have been added in this revised version of the paper. We add two main figures and one supplementary figure where we characterize the response in vivo and also by single-cell sequencing of TILs.

      In contrast, UM22, also as a metastatic UM, did not respond to TIL treatment. In particular, the presence of MART1-responsive TILs. The reliability of the results obtained by the authors in the model of only one case of UM22 liver metastasis should be considered. The authors should likewise consider whether such a specific cellular taxon might also exist in other patients with metastatic UM, producing an immune response to tumor cells. The results would be more comprehensive if supported by relevant data.

      The reviewer has interpreted the results absolutely right, the allogenic and autologous MART1-specific TILs cells while reactive in vitro against UM22, cannot kill this tumor either in a subcutaneous or liver metastases model. We hypothesize this has to do with an immune exclusion phenotype and show weak immunohistochemistry that suggest this. We hope the addition of more UM1 data can be viewed as supportive of tumor-reactivity also in vivo.

      In addition, the authors in that study used previously frozen biopsy samples for TCR-seq, which may be associated with low-quality sequencing data, high risk of outcome indicators, and unfriendly access to immune cell information. The existence of these problems and the reliability of the results should be considered. If special processing of TCR-seq data from frozen samples was performed, this should also be accounted for.  

      We agree with the reviewers and acknowledge we never anticipated the development of single-cell sequencing techniques when we started biobank 2013. We performed dead cell removal before the 10x Genomics experiment. We have also done extensive quality controls and believe that the data from the biopsies should be viewed as a whole and that quantitative intra-patient comparisons cannot be done.

      Reviewer #2 (Public Review):  

      Summary:  

      The study's goal is to characterize and validate tumor-reactive T cells in liver metastases of uveal melanoma (UM), which could contribute to enhancing immunotherapy for these patients. The authors used single-cell RNA and TCR sequencing to find potential tumor-reactive T cells and then used patientderived xenograft (PDX) models and tumor sphere cultures for functional analysis. They discovered that tumor-reactive T cells exist in activated/exhausted T cell subsets and in cytotoxic effector cells. Functional experiments with isolated TILs show that they are capable of killing UM cells in vivo and ex vivo.

      Strengths:  

      The study highlights the potential of using single-cell sequencing and functional analysis to identify T cells that can be useful for cell therapy and marker selection in UM treatment. This is important and novel as conventional immune checkpoint therapies are not highly effective in treating UM. Additionally, the study's strength lies in its validation of findings through functional assays, which underscores the clinical relevance of the research. 

      We thank the reviewer for these kind words about our work.

      Weaknesses:  

      The manuscript may pose challenges for individuals with limited knowledge of single-cell analysis and immunology markers, making it less accessible to a broader audience.

      The first draft of the manuscript (excluding methods) was written by a person (J.A.N) who is not a bioinformatician. It has been corrected to include the correct nomenclature where applicable but overall it is written with the aim to be understandable. We have made an additional effort in this version. 

      Reviewer #1 (Recommendations For The Authors):  

      (1) Firstly, the authors should provide high-resolution pictures to ensure readability for readers. 

      We have converted to pdf ourselves and that improved resolution. We are happy to provide high-resolution to the office if needed for the printing.

      (2) Furthermore, some parts of the article are more colloquial, and the authors should consider the logic and academic nature of the overall writing of the article. For example, authors should double-check whether the relevant expressions in the results are correct. For example, 'TCR' in the fourth part of the results should be 'TRLs'.

      We thank the reviewer for the recommendations and have gone through the manuscript.

      (3) Moreover, UM22 is described several times in the results as a metastatic UM and should be clearly defined in the methodology.

      The UM22 and UM1 samples are described in-depth in Karlsson et al., Nature Communications, 2020, a paper that is cited in the beginning of Results as part of the narrative. The current work can be viewed as an extension of that work.

      (4) Finally, it is recommended that authors describe a part of the results in full before citing the corresponding picture, otherwise, it will lead to confusion among readers.

      We have made an effort in the revised version to describe the new data in more detail.

      Reviewer #2 (Recommendations For The Authors):  

      The manuscript is very interesting and important to understanding key aspects of uveal melanoma immune profile and functionality. However, in my opinion, there are a few aspects that could be addressed.  

      - The manuscript lacks comprehensive details about the samples used, such as their disease progression, response to treatment, or any relevant information that could shed light on potential differences between samples. It would be valuable to know whether these samples were collected before any systemic treatment or if any of the patients underwent immunotherapy post-sample collection, along with the outcomes of such treatments. Providing this information would enrich the manuscript and provide a more holistic view of the research.

      We thank the reviewer for the recommendation and have included a new Supplementary table 7 with information about the samples. We have also pasted in individual samples’ contribution to the UMAP to add further holistic view.  

      - The results presented and discussed in the manuscript seem to indicate that there were no significant differences across the various samples, including comparisons between lymph-node and liver metastases. However, this lack of variation or the reasons for not discussing any observed differences should be clarified. If there are distinctions between the samples, it would be beneficial to discuss these findings in the manuscript.

      We thank the reviewer for the recommendation. Whereas 14 samples are many for a uveal melanoma study it is not really powered to do intra-patient comparisons.

      - The manuscript may pose difficulties for individuals with limited knowledge of single-cell analysis and immunology markers, potentially limiting its accessibility. To make the research more inclusive, the authors might consider presenting the technical aspects of their work in a less descriptive manner and providing explanations for those less familiar with the technology. This would help a broader audience grasp the significance of the study's findings. 

      The manuscript is from a multidisciplinary team where all have read and commented. The draft was written by a tumor biologist and edited by a bioinformatician for accuracy. We honestly think it is more understandable than most studies in this bioinformatics era. But we have tried to describe the new data in an easier way.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 56: replace "pyomastitis" with "pyogenic skin infections".

      Corrected.

      (2) Line 58: replace "basal strains" with "ancestral strains".

      Corrected.

      (3) Line 62: population structure impacts gene acquisition too, however, gene acquisitions can be easier to connect with a phenotype. For example, acquisition of mecA is thought to be adaptive rather than just linked to a successful lineage. This same reasoning applies to resistance-associated mutations such as gyrA mutations in ST22 emergence.

      We completely agree with the reviewer that population structure also impacts gene acquisition. We wanted to convey that connecting gain or loss of genes to a change in particular phenotype is much easier than doing the same for a mutation, specially in the presence of strong linkage, and therefore gene level analysis is the focus of many previous studies. We have rewritten the sentence to better convey this idea:

      “Due to this limitation, studies of emerging strains often focus on gene level analysis such as acquisition of mobile genetic elements or loss of gene function as their effect on phenotype is easier to determine than that of point mutations.”

      (4) Line 112 this might be simply due to the smaller size of the intergenic regions chosen. I suggest to correct for the size of the genome segment considered.

      We thank the reviewer for pointing this out. The size of the intergenic was indeed the simple explanation for this observation. We have added the following sentence to the manuscript:

      “This is reflective of the fact that most of S. aureus genome sequence comprises of ORFs e.g. ~84% of TCH1516 genome is part of an ORF.”

      (5) Line 189: please add p values to supp table 2.

      We have added the p and q values from DBGWAS into Supp table 2. It is under the ‘DBGWAS Result’ sheet.

      (6) Line 227: high entropy indicates that this site is polymorph, not necessarily that there is selective pressure. In the extreme, this might actually point to a neutral position, since any amino-acid could be equally present (see for example https://www.nature.com/articles/s41467-022-31643-3#Sec10 ).

      We agree that high entropy by itself may point to a position with neutral selection leading to some false positives. However, we were focused on positions that were mostly biallelic in CC8, and with differential prevalence in USA300 vs non-USA300 (albeit in the presence of strong linkage disequilibrium) in addition to having high entropy in non-CC8 strains. This helps us filter some of the positions that were mostly monoallelic or with rare mutations while preserving other sites of interest. The approach was able to find cap5E mutation which has been associated with disruption of capsule production.

      (7) Line 271: show USA500 on the tree.

      Our current study is mostly focused on differences between USA300 and non-USA300 strains and we want to highlight those differences in the tree.

      (8) Line 327: still not possible to infer causality.

      We have changed the language to remove mentions of causality and instead talk about the association of GWAS enriched genes with measured transcriptional changes. The revised sentence now reads:

      “Here, we demonstrated how a model of transcriptional regulation with iModulons can be used to make a headway through the impasse created by the high linkage disequilibrium and identify GWAS-enriched mutations that are also associated with measurable phenotypic changes in the TRN.”

      (9) Line 324: subclades reference.

      We are unsure what this means.

      (10) Line 366: the authors seem to have used a bespoke pan-genome analysis approach. Would they be able to validate it using established tools such as Roary, Pirate or Panaroo? Panaroo in particular appears to have superior accuracy thanks to its pan-genome graph approach (https://github.com/gtonkinhill/panaroo). 

      We have added the results of Roary to our analysis (Figure S1b). The roary results largely agree with our biggest take away from pangenomics which is that our collection of genomes have a good coverage of the CC8 clade at the gene level.

      (11) Line 397: what was the size of the core genome?

      There were 24881 core sites. We have added the number to the manuscript.

      (12) Line 407: please add citation or website for SCCmecFinder.

      The citation of SCCmecFinder (45) is at the end of the sentence.

      (13) Line 421: I was not able to find the code used for this analysis in the github repository provided.

      The code can be found in “notebook/02_Preprocess_DBGWAS.ipynb” within the repo.

      (14) Line 427: this is a very complex analysis for a simple univariate comparison between USA300-vs-non USA300 strains with no correction for population structure. The authors should compare their results with a more established pipeline like Pyseer or Gemma that can handle kmers and show the added value of their approach.

      We wanted to take advantage of DBGWAS’s ability to collapse kmers into unitigs and further collapse significant unitigs within a genetic neighborhood into components. Unfortunately, we found that in many cases, it became difficult to determine the exact mutation that was being enriched e.g. (T234G) without doing lots of manual work. Our network analysis simply parses the DBGWAS graph to automatically extract these mutations, making the results more interpretable. It does not do any additional hypothesis testing.

      We also attempted to pass kmer data into GEMMA but without the compaction provided by DBGWAS the memory required (>168 GB) exceeded what we had available.

      (15) DBGWAS: please indicate DBGWAS version and the options used for kmer size and number of neighbour nodes retained in the subgraph. Also, I assume that no correction for population structure was applied.

      We have added the version and parameters for DBGWAS. The method section now reads:

      “DBGWAS (v0.5.4) was used to enrich mutations unique to USA300 strains using default kmer size of 31 (-k 31) and neighborhood size of 5 (-nh 5). Alleles with frequency less than 0.1 were filtered  (-maf 0.1) and all components enriched with q-values less than 0.05 were documented (-SFF q0.05).”

      (16) Could the authors provide the DBGWAS output for the most significant unitings in graph format? This would help readers understand the findings.

      The outputs are available in the github repo. The link to this specific data is (https://github.com/sapoudel/USA300GWASPUB/tree/master/data/dbgwas/dbgwas_output/visualisations)

      The text format of the output is part of Supplementary Table 2 under “DBGWAS Result” sheet.

      (17) Line 469: please provide more details on iModulons, it is not enough to simply reference the paper: specific QC criteria, mapping algorithm and parameters, ICA algorithm.

      We have now added a new Supplementary Note 2 section with more details about building iModulons.

      (18) Line 474: what is log-TPM?

      Log-Transcripts per Million. We have added the description in the text.

      (19) Line 479: not sure what "Chapter 3" refers to.

      Thank you for correcting the mistake. The reference has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Line 45. The introduction is not well-structured, and there is a lack of coherence among the topics pertinent to the research objective. I would recommend rewriting this section addressing the following topics: the challenge of distinguishing lineages within the CC8, especially the CA-MRSA USA300 strains; discussing the state-of-the-art GWAS methodologies, elucidating the main confounding factors in the application of GWAS to bacterial studies, and finally, exploring how current methods aim to address these concerns.

      We would like to thank the reviewer for the suggestions. The main innovation of the paper is using iModulons to find phenotype associated mutations from a set of linked mutations. The challenge of distinguishing CC8 subclades has been largely resolved thanks to efforts by Bowers et al. (PMID: 29720527). We have made some revisions to address the GWAS methodologies (bugwas and DBGWAS), the effect of linkage disequilibrium in interpreting the output of these methods and how combining the results of these association tests with modeling of TRN with iModulons can lead to finding candidate mutations of interest that are linked to specific changes in gene regulation.

      Line 56. Replace "pyomastitis" with "pyomyositis".

      Corrected to “pyogenic skin infections.”

      Lines 71. What do the authors mean by "endemic USA300 strain"?

      We have removed references to endemic strains.

      Line 106. Please verify the number of genomes used in the DBGWAS analysis. In the text, the authors mention that 2038 genomes were utilized. However, in Supplementary Table 1, only 2030 genomes are listed.

      Thank you for catching the discrepancy. We started the analysis with 2037 genomes, including four “spiked-in” reference genomes- USA100 D592 (CC5 strain used for rooting the CC8 tree), TCH1516 (same accession number as the one used for ICA), COL and Newman. Before further analysis, we removed 6 genomes for being smaller than 2.5 million base-pairs (see preprocessing.ipynb) and the USA100 D592 strain as it is not part of CC8. This resulted in 2030 genomes being used for DBGWAS. We kept the other 3 spiked CC8 genomes to help annotate the unitigs from DBGWAS.  Lastly, we removed the other three CC8 clade spiked genomes for pangenomic analysis. To clarify this, we have made the following changes to the text:

      (1) Changed line 106: We downloaded 2033 S. aureus genomes for analysis and excluded six of them with genome length of less than 2.5 million base pairs. The remaining 2027 S. aureus CC8 genomes formed a closed pangenome, suggesting that the sampled genomes mostly captured the gene level variations within the clonal complex (Figure 1a).

      (2) DBGWAS section Line 177: We used 2030 genomes for this analysis; the 2027 genomes in pangenomics analysis above were “spiked” with three well known CC8 genomes- TCH1516, COL, and Newman- to help annotate the DBGWAS unitigs.

      Line 108. Could the authors provide a table with the genes that constitute the core, accessory genome, and unique genes for each of the strains?

      The genes presence absence tables are very large files and therefore we have only added them to our github repo. The results can be found in following files:

      Pangenomics: data/pangenome/Pangenomics/CC8_strain_by_gene.pickle.gz

      Lines 112 and 315. On what basis did the authors decide on the size of the upstream regulatory region? In the search for mutations, they extracted segments of 300 base pairs, whereas, in the search for the Fur binding motif, only 100 base pairs were considered. The RegPrecise database contains regulons for Staphylococcus aureus N315 (https://regprecise.lbl.gov/genome.jsp?genome_id=26), including the Fur regulon with multiple Transcription Factor Binding Sites (TFBSs) that extend beyond the 100 base-pair sequence. I would recommend reconsidering the search within the standardized upstream region of -400 base pairs. In the case of the Fur binding motif search, it might be beneficial to include the TFBSs available in the RegPrecise database.

      For Fur motif search, we chose 100 base-pairs because the Fur motif in non-USA300 strains were within ~20 base-pairs of isdH translation start site (Figure 4C). In our search of Fur motif in this analysis, we were not looking to see if any exists, we were simply looking to see if the one proximal to the translation start site exists as our DBGWAS analysis suggested that specific region was deleted in USA300 strains.

      Line 175. This work aimed to identify potential mutations associated with the success of a specific lineage rather than a phenotype, where correction for population structure effects is necessary. Would the implementation of the bugwas method in DBGWAS for controlling bacterial population structure not potentially impact the results? How was this issue addressed in your analysis? Would it not be pertinent to run a program without population structure correction to enable a comparison of results?

      We initially tried to use Linear Mixed Models to find kmers that were only enriched in USA300 strains. These efforts were hampered by extreme linkage disequilibrium which led to high collinearity between kmer abundance making it extremely difficult to get a good estimate of the coefficients. We also tried to run chi-squared tests individually on each kmer which led to unmanageable number (>100k) kmers that were significantly different. DBGWAS on the other hand was able to compress unbranched kmers in the De Bruijn into unitigs and further reduce the number of tests by testing at pattern level instead of unitig level. We found no straight forward way to run DBGWAS (or GEMMA) without population structure correction. Therefore, it is likely we may be underestimating the number of significant unitigs with this approach.

      Line 189. Please italicize the gene name cap5E.

      Corrected.

      Line 277. Please clarify the QC/QA criteria and curation process employed for the selection of RNA-seq experiments, as this constitutes a crucial step in the reconstruction of the network.

      We have now added a new supplementary material section, Supplementary Note 2 titled “Creating iModulons for CC8 Clade Staphylococcus aureus” with details of QC/QA.

      Line 279. In Supplementary Table 3, please label the first column and standardize the use of either the experiment ID or the run ID. Furthermore, verify the experiment identifiers from rows 19 to 26, as I could not locate them in the SRA database.

      We have changed all accession to experiment ID including rows 19 to 26.

      Lines 290, 330, 424, and 437. Please correct "SCCMec" to "SCCmec IVa" (italicize "mec").

      Corrected.

      Line 298. What is the size of the upstream regulatory region considered for this analysis? It is important to standardize this value for all analyses involving the upstream regulatory region. In this regard, I recommend maintaining a consistent size of -400 base pairs.

      For Fur motif search we chose 100 base-pairs because the Fur motif in non-USA300 strains were within ~20 base-pairs of isdH translation start site (Figure 4C). In our search of Fur motif in this analysis, we were not looking to see if any exists, we were simply looking to see if the one proximal to the translation start site exists as our DBGWAS analysis suggested that specific region was deleted in USA300 strains. In our usual analysis, we use -300 base pairs.

      Line 321. The discussion is rather concise and lacks an in-depth comparative perspective with relevant literature on any of the obtained results, whether concerning the proposed methodology or the potential new markers associated with the success of the USA300 lineage. The authors must underscore the method is not applicable to all GWAS analyses, due to the issue of correction for population structure.

      We have now added sections talking about the importance of isdH in S. aureus infection and a section addressing the limitation of the current approach when applied to other GWAS type study.

      Line 366. The authors employed the methodology described in the article by Hyun et al. 2022 (https://doi.org/10.1186/s12864-021-08223-8) to construct the pangenome. However, this methodology was designed for comparative analysis of pangenomes across various species, which does not align with the objective of this study, focusing solely on S. aureus genomes. Consequently, it remains unclear to me why the authors made this particular choice and, more importantly, what advantages it offers over well-established tools for individual pangenomes, such as Roary. I would strongly recommend validating the results using at least one established tool.

      With our analysis, we can determine proper thresholds for core/accessory/unique genes based on the observed data (Supplementary Figure 1a). However, we agree that it would be proper to include a more established pangenome package. We have added the results of Roary to our analysis. The Roary results largely agree with our biggest take away from pangenomics which is that our collection of genomes have a good coverage of the CC8 clade at the gene level.

      Line 370. Please include the version of CD-HIT that was utilized.

      Added. CD-HIT version 4.6 was used for the analysis.

      Line 372. What tool did the authors use to extract these regions?

      The list of CDS, 5’ and 3’ sequences can be extracted easily with a combination of fasta file and gff file. The gff file was used to find the position of each of these sequences and the sequences were extracted from the fasta file with python scripts.

      Line 395. What were the QC/QA criteria used to select the sequences?

      The QC/QA criteria for the sequences are mentioned in the beginning of the Pangnomic analysis subsection and is as follows:

      “Briefly, “complete” or “WGS” samples from CC8/ST8 were downloaded from the PATRIC database. Sequences with lengths that were not within 3 standard deviations of the mean length or those with more than 100 contigs were filtered out.”

      Line 407. Please correct the tool name to "SCCmecFinder" (italicize "mec").

      The name has been corrected.

      Line 409. I believe BLASTp was run locally, so please specify the version used and the search parameters.

      As corrected further down, we used BLASTn not BLASTp. The version v2.2.31 has been added to the methods section.

      Line 416. There is conflicting information with line 409, which mentions that PVL was identified through a protein BLAST, but right below, it states it was a BLASTn. Please verify which information is correct and consider the previous comment to specify the version and parameters.

      Thank you catching the discrepancy. We have corrected the text:

      “PVL was detected using nucleotide BLAST.”

      Line 418. Please provide the column identifiers for the Supplementary Table 5 (PVL worksheet).

      Column names are added.

      Line 418. Please remove the repeated word "and" in Supplementary Table 5 (mecA worksheet) and italicize the gene names in this table.

      Corrected

      Line 419. You can use the abbreviation "SNPs" since it was introduced in line 65.

      Corrected.

      Line 420. In my view, this analysis could benefit from a more detailed and clearer explanation.

      We have added to the explanation. The section now reads:

      “To find the root of the USA300 strains in the phylogenetic tree, the genomes in the tree were first annotated by their PVL and SCC_mec_ status. Then the tree traversed from leaf to root starting from known USA300 strains – TCH1516 and FPR3757- while keeping track of the number of descendant genomes from the current root that contained known markers SCC_mec_ IVa and PVL. The node where the number of genomes with the markers started flatlining was marked as the root of USA300.”

      Line 428. Specify the version and parameters used in the analysis with DBGWAS.

      Added. The text now reads:

      “DBGWAS (v0.5.4) was used to enrich mutations unique to USA300 strains using default kmer size of 31 (-k 31) and neighborhood size of 5 (-nh 5). Alleles with frequency less than 0.1 were filtered  (-maf 0.1) and all components enriched with q-values less than 0.05 were documented (-SFF q0.05).”

      Line 431. What tools were employed to calculate Pearson correlation and distances relative to the reference genome?

      Added. The text now reads:

      “Genome-wide linkage was estimated by Pearson correlation (calculated with built-in Pandas function) of the presence/ absence of enriched kmers and distance was measured based on the kmer alignment to the reference TCH1516 genome as determined by BLASTn.”

      Line 450. What type of BLAST was used?

      Added. Nucleotide blast was used for all kmer analysis.

      Line 452. I didn't quite understand the reason for making this analysis available in a separate repository. It would be easier for readers looking to reproduce the work if all the codes were in a single repository.

      We kept the repository separate in case we wanted to further develop the network analysis code in the future. We have added the link to the network analysis repository in the README of the publication repo.

      Line 460. Please specify the version and parameters, if run locally, or indicate if a web page was used.

      Corrected to indicate that we used the PATRIC website for this

      Line 470. Specify the version and provide a detailed account of all parameters used, along with the QC/QA criteria and curation methods applied.

      We have added Supplementary Note 2 with all the details about packages and parameters used to calculate the iModulons.

      Line 479. The phrase "ICA was then run as previously described in chapter 3" does not make sense. Please clarify.

      We have corrected the mistake and added a new supplementary note with details about our ICA run. The line now reads:

      “A detailed version of the methods for RNA-sequencing and ICA analysis is available as Supplementary Note 2. ICA of RNA sequencing data was performed using the pymodulon package.”

      Line 484. Specify the version of CD-HIT.

      Added. The version used was v4.6.

      Line 494. To enable reproducibility, the repository should be better organized, especially the directory containing the code. Numbering each script in the order it was run would assist the reader in comprehending the overall analysis flow and adapting it to their needs. If creating a manual for method usage is not feasible, the code could be more extensively commented on to explain the parameters, choices made, and how these could be modified. The "Data" folder seems to contain some test files, such as those in the "isdh_fimo" folder, so removing test files would aid the understanding of the reader.

      Thank you for the suggestions. We have now numbered the notebooks that generate the figures, we have added more comments to the code, removed testing code and test datasets.

      Throughout the article, please correct "SCCMec" to "SCCmec" (italicize "mec").

      Corrected.

    1. Author response:

      (1) The manuscript emphasizes the hypothesis that stable super-complexes, maintained through sequential replacement of subunits, might underlie the long-term storage of memory. While an interesting idea, this notion requires considerably more research. The presented experimental data are indeed consistent with this notion, but there is no evidence that these complexes are causally related to memory storage. 

      We agree with the reviewer that, while our data support the idea that subunit exchange in supercomplexes could underlie long-term memory storage, more research is necessary to conclusively validate this hypothesis. The experimental data presented are consistent with the idea that stable supercomplexes, maintained through sequential replacement of subunits, play a role in memory retention. However, establishing a causal relationship between these supercomplexes and memory storage will require additional experiments and in-depth analyses.

      (2) Much of the presented work is performed on biochemically isolated protein complexes. The biochemical isolation procedures rely on physical disruption and detergents that are known to alter the composition and structure of complexes in certain cases. Thus, it remains unclear how the protein complexes described in this study relate to PSD95 complexes in intact synapses. 

      Whilst it could be the case that biochemical isolation procedures have the potential to alter the composition and structure of protein complexes, we have previously published the protocol used to isolate PSD95-containing supercomplexes (Nat Commun. 2016; 7: 11264). In that study, we demonstrated that the isolated supercomplexes are approximately 1.5 MDa in size and contain multiple proteins, including other scaffolding proteins (e.g., PSD93) and receptors (e.g., NMDARs). Importantly, these supercomplexes remain stable when exposed to detergents and dilution, strongly indicating that they represent the native complexes present in intact synapses.

      (3) Because not all GFP molecules mature and fold correctly in vitro and the PSD95-mEos mice used were heterozygous, the interpretation of the corresponding quantifications is not straightforward. 

      Although genetic tagging ensures a 1:1 labeling stoichiometry, we acknowledge that the presence of unfolded GFP and the use of heterozygous PSD95-mEos mice can complicate the analysis. We have highlighted this limitation in the manuscript. Nonetheless, our results show a high level of consistency across the different genetic fusions used in this study.

      (4) It was not tested whether different numbers of PSD95 molecules per super-complex might contribute to different retention times of PSD95, e.g. in synaptic vs. total-forebrain super-complexes. 

      The potential impact of varying numbers of PSD95 molecules per super-complex on retention times was considered. However, our analysis showed minimal differences in the distribution of molecule numbers per super-complex between the synaptic and forebrain samples.

      (5) The conclusion that the population of 'mixed' synapses is higher in the isocortex than in other brain regions is not supported by statistical analysis. 

      The conclusion that the population of 'mixed' synapses is higher in the isocortex than in other brain regions is indeed supported by statistical analysis. All relevant statistical data are detailed in Table S2, and the finding is statistically significant. We will emphasize this point in the revised manuscript.

      (6) The validity of conclusions regarding PSD95 degradation based on relative changes in the occurrence of SiR-Halo-positive puncta is limited.

      We recognize that conclusions based solely on the relative changes in SiR-Halo-positive puncta concerning PSD95 degradation have limitations. To address this, we also quantified the “new” PSD95 by analyzing AF488-Halo-positive molecules.

    1. Author Response:

      Thank you for the reviews and the eLife assessment. We want to take this opportunity to acknowledge the weaknesses pointed out by the reviewers and we will make small changes to the manuscript to account for these as part of the Version of Record.

      The tools are command-based and store outcomes locally

      We consider this to be an advantage of our ecosystem, which is intended for the case of individuals or small groups of authors. These features facilitate easy installation and integration with other tools. Further, our tool labelbuddy is a graphic user interface. Our tools may also be integrated into web-based systems as backends. Pubget is already being used in this way in the NeuroSynth Compose platform for semi-automated neuroimaging meta-analyses.

      pubget only gathers open-access papers from PubMed Central

      We recognize this as a limitation, and we acknowledge it in the original manuscript (in the discussion section, starting with "A limitation of Pubget is that it is restricted to the Open-Access subset of PMC"). We chose to limit the scope of our tools in order to ensure maintainability. Further, we are currently expanding pubget so it will also be able to access the abstracts and meta-data from closed-access papers indexed on PubMed. Future research could build other tools to work alongside pubget, to access other databases.

      Logic flow is difficult to follow

      We thank the reviewer for this feedback. Our paper describes an ecosystem of literature mining tools which does not lend itself to narrative flow nor does readily fit into the standard "Intro, Results, Discussion, Methods" structure that is typical in the scientific literature. We have done our best to conform to this expected format, but we have also provided detailed section and subsection headings to enable the reader to digest the paper nonlinearly. Each of the tools we describe also has detailed documentation on github that we update continuously.

      Results were not validated

      For the example where we automatically extracted participant demographics from papers, we validated the results on a held-out dataset of 100 manually-annotated papers. For the example with automatic large-scale meta-analyses (neuroquery and neurosynth), these methods are described together with their validation in the original papers. If this ecosystem of tools is integrated into other workflows, it should be validated in those contexts. We recognize that validating meta-analyses is a difficult problem because we do not have ground truth maps of the brain.

      Efficiency was not quantified

      Creators of tools do not always do experiments to quantify their efficiency and other qualities. We have chosen not to do this here, first because it is outside the scope of this paper as it would necessitate to specify very precise tasks and how efficiency is measured, and second because at least for the data collection part, the benefit of using an automated tool over manually downloading papers one by one is clear even without quantifying it. Compared to the approach of re-using existing datasets, our ecosystem is not necessarily more or less efficient. But it has other advantages, such as providing datasets that contain the latest literature, whereas the existing datasets are static and quickly out-of-date.

      We do not highlight the strength of AI functions

      We provide an example of using our tools to gather data and manually annotate a validation set for use with large language models (in our case, GPT). We are further exploring this domain in other projects; for example, for performing semi-automated meta-analyses using the NeuroSynth Compose platform. However, we did not deem it necessary to include more AI examples in the current paper; we only wanted to provide enough examples to demonstrate the scope of possible use cases of our ecosystem.

      We thank the reviewers for their time and valuable feedback, which we will keep in mind in our future research.

    1. Author response:

      Thank you for handling our paper and our thanks to the reviewers for their engagement, comments and valuable suggestions. We will take the opportunity to provide a full response and submit a revised version in the coming weeks.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      The authors provide solid data on a functional investigation of potential nucleoid-associated proteins and the modulation of chromosomal conformation in a model cyanobacterium. While the experiments presented are convincing, the manuscript could benefit from restructuring towards the precise findings; alternatively, additional data buttressing the claims made would significantly enhance the study. These valuable findings will be of interest to the chromosome and microbiology fields.

      We appreciate editors for taking time for assessment and reviewers for giving critical suggestions. Both reviewers were concerned about our interpretation of 3C data, and Reviewer #2 suggested the biochemistry of cyAbrB2 to reinforce our claim. We agree with the concern and suggest editors add a sentence “How cyAbrB2 affects chromosome structure is still elusive from this study, and the biochemical assays are needed in the future experiment.” to the eLife assessment.

      The major revision points are the following;

      Reconstruction of Figures

      Previous Figure 5E has been omitted

      Additional 3C data on the nifJ region

      Rephrasing the conclusion of 3C data

      Additional discussion on cyAbrB2 and NAPs

      Reviewer #1 (Public Review): 

      Strength: 

      At first glance, I had a very positive impression of the overall manuscript. The experiments were well done, the data presentation looks very structured, and the text reads well in principle.

      Weakness: 

      Having a closer look, the red line of the manuscript is somewhat blurry. Reading the abstract, the introduction, and parts of the discussion, it is not really clear what the authors exactly aim to target. Is it the regulation of fermentation in cyanobacteria because it is under-investigated? Is it to bring light to the transcriptional regulation of hydrogenase genes? The regulation by SigE? Or is it to get insight into the real function of cyAbrB2 in cyanobacteria? All of this would be good of course. But it appears that the authors try to integrate all these aspects, which in the end is a little bit counterintuitive and in some places even confusing. From my point of view, the major story is a functional investigation of the presumable transcriptional regulator cyAbrB2, which turned out to be a potential NAP. To demonstrate/prove this, the hox genes have been chosen as an example due to the fact that a regulatory role of cyAbrB2 has already been described. In my eyes, it would be good to restructure or streamline the introduction according to this major outcome. 

      As you pointed out, the major focus of this study is cyAbrB2 as a potential NAPs. To focus on NAPs, we simplified the first paragraph of the discussion (ll.246-263) and added the section comparing cyAbrB2 with other known NAPs (11.269-299). To emphasize the description of cyAbrB2, we also rearranged the figures and divided the analysis on cyAbrB2 ChIP into two figures. We reduced the first paragraph of the introduction but mostly preserved the composition of the introduction to keep the general to specific pattern, even though the manuscript is blurry.

      Points to consider: 

      The authors suggest that the microoxic condition is the reason for the downregulation of e.g. photosynthesis (l.112-114). But of course, they also switched off the light to achieve a microoxic environment, which presumably is the trigger signal for photosynthesis-related genes. I suggest avoiding making causal conclusions exclusively related to oxygen and recommend rephrasing (for example, "were downregulated under the conditions applied").

      We agree with this point. We rephrased l.114 to “by the transition to dark microoxic conditions from light aerobic conditions” (ll.108-109).

      The authors hypothesized that cyAbrB2 modulates chromosomal conformation and conducted a 3C analysis. But if I read the data in Figure 5B & C correctly, there is a lot of interaction in a range of 1650 and 1700 kb, not only at marked positions c and j. Positions c and j have been picked because it appears that cyAbrB2 deletion impacts this particular interaction. But is it really significant? In the case of position j the variation between the replicates seems quite high, in the case of position c the mean difference is not that high. Moreover, does all this correlate with cyAbrB2 binding, i.e. with positions of gray bars in panel A? If this was the case, the data obtained for the cyabrB2 mutant should look totally different but they are quite similar to WT. That's why the sentence "By contrast, the interaction frequency in Δcyabrb2 mutant were low and unchanged in the aerobic and microoxic conditions" does not fit to the data shown. But I have to mention that I am not an expert in these kinds of assays. Nevertheless, if there is a biological function that shall be revealed by an experiment, the data must be crystal clear on that. At least the descriptions of the 3C data and the corresponding conclusions need to be improved. For me, it is hard to follow the authors' thoughts in this context. 

      According to your suggestion, we again have carefully observed the 3C data. Furthermore, we conducted an additional 3C experiment on nifJ region (Figures 7F-J). Then we admit we had overinterpreted the 3C data. Therefore, we rewrote the result and discussion of the 3C assay in line with the data (ll.220-245) and removed the previous Figure 5E. Following are individual responses.

      Positions c and j have been picked because it appears that cyAbrB2 deletion impacts this particular interaction. But is it really significant?

      We could not find statistically significant differences at locus c and j. Therefore, we added this in the result section “Note that the interaction scores exhibit considerable variability and we could not detect statistical significance at those loci.” (ll.231-232)

      does all this correlate with cyAbrB2 binding, i.e. with positions of gray bars in panel A?

      As you are concerned, interaction frequency and cyAbrB2 binding do not correlate. Therefore, we withdraw the previous claim and stated as follows; “Moreover, our 3C data did not support bridging at least in hox region and nifJ region, as the high interaction locus and cyAbrB2 binding region did not seem to correlate (Figure 7).” (ll.280-282)

      If this was the case, the data obtained for the cyabrB2 mutant should look totally different but they are quite similar to WT.

      We rewrote it as follows; “Then we compared the chromatin conformation of wildtype and cyabrb2∆. Although overall shapes of graphs did not differ, some differences were observed in wildtype and cyabrb2∆ (Figures 7B and 7G); interaction of locus (c) with hox region were slightly lower in cyabrb2∆ and interaction of loci (f’) and (g’) with nifJ region were different in wildtype and cyabrb2∆. Note that the interaction scores exhibit considerable variability and we could not detect statistical significance at those loci.” (ll.228-232)

      That's why the sentence "By contrast, the interaction frequency in Δcyabrb2 mutant were low and unchanged in the aerobic and microoxic conditions" does not fit to the data shown.

      We rewrote the sentence as follow; “While the interaction scores exhibit considerable variability, the individual data over time demonstrate declining trends of the wildtype at locus (c) and (j) (Figure S8). In ∆cyabrb2, by contrast, the interaction frequency of loci (c) and (j) was unchanged in the aerobic and microoxic conditions (Figure 7E). The interaction frequency of locus (c) in ∆cyabrb2 was as low as that in the microoxic condition of wildtype, while that of locus (j) in ∆cyabrb2 was as high as that in the aerobic condition of wildtype (Figures 7B and 7C).” (ll.238-243)

      The figures are nicely prepared, albeit quite complex and in some cases not really supportive of the understanding of the results description. Moreover, they show a rather loose organization that sometimes does not fit the red line of the results section. For example, Figure 1D is not mentioned in the paragraph that refers to several other panels of the same figure (see lines110-128). Panel 1D is mentioned later in the discussion. Does 1D really fit into Figure 1 then? Are all the panels indeed required to be shown in the main document? As some elements are only briefly mentioned, the authors might also consider moving some into the supplement (e.g. left part of Figure 1C, Figure 2A, Figure 3B ...) or at least try to distribute some panels into more figures. This would reduce complexity and increase comprehensibility for future readers. Also, Figure 3 is a way too complex. Panel G could be an alone-standing figure. The latter would also allow for an increase in font sizes or to show ChIP data of both conditions (L+O2 and D-O2) separately. Moreover, a figure legend typically introduces the content as a whole by one phrase but here only the different panels are described, which fits to the impression that all the different panels are not well connected. Of course, it is the decision of the authors what to present and how but may they consider restructuring and simplifying.

      According to the advice, we have rearranged the Figure composition.

      The left side of Figure 1C has been moved to supplement. Instead, representative expression fold changes of “Transient”, “Plateau”, “Continuous”, and “Late” genes are shown for comprehensibility. We left Figure 1D in Figure 1, as this diagram shows our motive to focus on hox and nifJ. We moved Figure 2A to supplement. We did not move Fig3B, as this figure shows the distribution of cyAbrB2 (“long tract of AT-rich DNA”) comprehensively and simply. We agree that Figure 3 was too complex. Therefore, we moved Figures 3F and 3G to a new independent figure (Figure 4). In Figure 4C (former 3G), we show the ChIP data of the L+O2 condition only, and the change of ChIP data under the D-O2 condition is shown in Figure 5. The schematic image showing cyanobacterial chromosome and NAPs (previous Figure 5E) was omitted because it was overinterpreting.

      The authors assume a physiological significance of transient upregulation of e.g. hox genes under microoxic conditions. But does the hydrogenase indeed produce hydrogen under the conditions investigated and is this even required? Moreover, the authors use the term "fermentative gene". But is hydrogen indeed a fermentation product, i.e. are protons the terminal electron acceptor to achieve catabolic electron balance? Then huge amounts of hydrogen should be released. Comment should be made on this.

      This is a very important point; Yes, hydrogenase indeed produces hydrogen under the conditions we investigated, and proton accepts a majority of reducing power under the dark microoxic condition. We wrote in the introduction section as follows; “Hydrogen is generated in quantities comparable to lactate and dicarboxylic acids as the result of electron acceptance in the dark microoxic condition (Akiyama and Osanai 2023; Iijima et al. 2016)” (ll.54-55). The detailed explanation is below, although omitted from the manuscript.

      A recent study (Akiyama and Oasanai 2023) quantified the consumed glycogen and secreted fermentative products (hydrogen, lactate, dicarboxylic acid, and acetate) in the Synechocystis under the dark microoxic condition, the same conditions as we investigated. The system of the study consists of a 10 mL liquid layer and a 10 mL gas layer, cultivated for 3 days under dark microoxic conditions. Then the amounts of lactic acid, dicarboxylic acid, and hydrogen were approximately 2 µmol, 3.5 µmol, and 11µmol (assuming the gas layer was at 1 atm and ignoring aqueous population), respectively. On the other hand, glycogen equivalent to 15µmol of glucose was consumed in the system. This estimate supports hydrogen accounts for a substantial portion of fermentative products during dark microoxic conditions.

      The necessity of hydrogen production under dark microoxic conditions was demonstrated in (Gutekunst et al. 2014). They show hydrogenase activity is required for the mixotrophic growth in the light-dark and microoxic cycle with arginine. The necessity remains unclear in our conditions because we only performed continuous dark microoxic conditions without glucose.

      The authors also mention a reverse TCA cycle. But is its existence an assumption or indeed active in cyanobacteria, i.e. is it experimentally proven? The authors are a little bit vague in this regard (see lines 241-246).

      We misused the Terminology. We mean to mention the “reductive branch of TCA”. Cyanobacteria conduct the branched TCA cycle under microoxic conditions. One of the branches is the reductive branch, which reduces oxaloacetate to produce malate. We corrected “reverse TCA cycle” to “reductive branch of TCA”. (Figure 1D and ll.260-262)

      Reviewer #2 (Public Review): 

      This work probes the control of the hox operon in the cyanobacterium Synechocystis, where this operon directs the synthesis of a bidirectional hydrogenase that functions to produce hydrogen. In assessing the control of the hox system, the authors focused on the relative contributions of cyAbrB2, alongside SigE (and to a lesser extent, SigA and cyAbrB1) under both aerobic and microoxic conditions. In mapping the binding sites of these different proteins, they discovered that cyAbrB2 bound many sites throughout the chromosome repressed many of its target genes, and preferentially bound regions that were (relatively) rich in AT-residues. These characteristics led the authors to consider that cyAbrB2 may function as a nucleoid-associated protein (NAP) in Synechocystis, given its functional similarities with other NAPs like H-NS. They assessed the local chromosome conformation in both wild-type and cyabrB2 mutant strains at multiple sites within a 40 kb window on either side of the hox locus, using a region within the hox operon as bait. They concluded that cyAbrB2 functions as a nucleoid-associated protein that influences the activity of SigE through its modulation of chromosome architecture.

      The authors approached their experiments carefully, and the data were generally very clearly presented and described.

      Based on the data presented, the authors make a strong case for cyAbrB2 as a nucleoid-associated protein, given the multiple ways in which it seems to function similarly to the well-studied Escherichia coli H-NS protein. It would be helpful to provide some additional commentary within the discussion around the similarities and differences of cyAbrB2 to other nucleoid-associated proteins, and possible mechanisms of cyAbrB2 control (post-translational modification; protein-protein interactions; etc.). The manuscript would also be strengthened with the inclusion of biochemical experiments probing the binding of cyAbrB2, particularly focusing on its oligomerization and DNA polymerization/bridging potential.

      We agree with the comment that the biochemical experiments will deepen our insights into the cyAbrB2 and chromatin conformation. As the reviewer pointed out, the biochemical assay will provide valuable information on mechanisms of cyAbrB2 control, such as post-transcriptional modification, cooperation with cyAbrB1, oligomerization, and the structure of cyAbrB2-bound DNA. However, we think those potential findings are worth of new independent research paper, rather than a part of this paper. Therefore, we added a discussion mentioning biochemistry as the future work (ll.275-290; the section of “The biochemistry of cyAbrB2 will shed light on the regulation of chromatin conformation in the future”).

      Previous work had revealed a role for SigE in the control of hox cluster expression, which nicely justified its inclusion (and focus) in this study. However, the results of the SigA studies here suggested that SigA both strongly associated with the hox promoter, and its binding sites were shared more frequently than SigE with cyAbrB2. The focus on cyAbrB2 is also well-justified, given previous reports of its control of hox expression; however, it shares binding sites with an essential homologue cyAbrB1. Interestingly, while the B1 protein appears to bind similar sites, instead of repressing hox expression, it is known as an activator of this operon. It seems important to consider how cyAbrB1 activity might influence the results described here.

      We infer that the minor side of the bimodal SigE peak is the genuine population that contributes to hox transcription, as hox genes are expressed in a SigE-dependent manner (Figure S2). We considered the strong SigA peak upstream of the hox operon binds the promoter of TU1715, the opposite direction of the hox operon. We added a description of the single SigA peak and bimodal SigE peak near the TSS of the hox operon as follows;

      “A bimodal peak of SigE was observed at the TSS of the hox operon in a microoxic-specific manner (Figure 6C bottom panel). The downstream side of the bimodal SigE peak coincides with SigA peak and the TSS of TU1715. Another side of the bimodal peak lacked SigA binding and was located at the TSS of the hox operon (marked with an arrow in Figure 6C), although the peak caller failed to recognize it as a peak.” (ll.206-209)

      The point that cyAbrB1 binds similar sites as cyAbrB2, despite regulating hox expression in the opposite direction, is very interesting. Therefore, we referred to the transcriptome data of the cyAbrB1 knockdown strain and compared the impact of cyAbrB1 knockdown and cyAbrB2 deletion. We described in result and discussion as follows;

      “we referred to the recent study performing transcriptome of cyAbrB1 knockdown strain, whose cyAbrB1 protein amount drops by half (Hishida et al. 2024). Among 24 genes induced by cyAbrB1 knockdown, 12 genes are differentially downregulated genes in cyabrb2∆ in our study (Figure S5D).” (ll.162-165)

      “CyAbrB1, the homolog of cyAbrB2, may cooperatively work, as cyAbrB1 directly interacts with cyAbrB2 (Yamauchi et al. 2011), their distribution is similar, and they partially share their target genes for suppression (Figures 3A S5C and S5D). The possibility of cooperation would be examined by the electrophoretic mobility shift assay of cyAbrB1 and cyAbrB2 as a complex. Despite their similar repressive function, cyAbrB1 and cyAbrB2 regulate hox expression in the opposite directions, and their mechanism remains elusive.” (ll.292-296)

      Hox operon differs from this general tendency. To see if cyAbrB1 behaves differently from cyAbrB2 in the hox operon, we did an additional ChIP-qPCR experiment on cyAbrB1 in the aerobic condition and the dark microoxic condition (Figure 5C). However, we could not find the difference.

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1B: I recommend changing the header in the grey bar to terms like "upregulated" and "downregulated", which are also used in the legend description. Upregulation of genes can also be a result of de-repression, which is why the term "activated" is somewhat misleading.

      Corrected.

      Lines 114-116: It is unclear what the authors exactly mean here. Please clarify. 

      We rephrase the sentence “The enrichment in the butanoate metabolism pathway indicates the upregulation of genes involved in carbohydrate metabolism. We further classified genes according to their expression dynamics.” (ll.110-111)

      Reviewer #3 (Recommendations For The Authors): 

      Major/experimental comments: 

      (1) For the chromosome conformation capture experiments, it is indicated that these were conducted at aerobic (1hr) and microoxic (4 hr) conditions. But the data presented in Figure 1 suggest that 1 hr corresponds to the beginning of microoxic growth, and that time 0 is aerobic. The composite 3C data in Figure 5 show some interesting but specific differences. It is appreciated that the authors presented the profiles for individual samples in Figure S7, and the differences here do not seem to be as compelling. Are the major differences being highlighted significantly (statistically) different (e.g. at the (c) and (j) loci)? Might the differences be starker if an earlier aerobic condition (e.g. time 0) had been used instead of the 1 hr - microoxic - timepoint?

      Previous Figure 5 consisted of three time points (solid line: aerobic condition, dashed line:1hr of microoxic condition, and dotty line:4hr of microoxic condition). We omitted data of 4hr in the main figure (Figure 7) as 4hr in microoxic conditions makes data complicated. Three time points are shown in the profiles of individual loci (Figure S8).

      There is no statistical significance found in (c) and (j) loci by t-test. Therefore, we have toned down the interpretation of 3C data as follows; “Our 3C result demonstrated that cyAbrB2 influences the chromosomal conformation of hox and nifJ region to some extent (Figure 7).” (ll.325-326)

      (2) This is a complicated system that involves multiple regulatory proteins, each of which is differentially affected by the growth conditions (aerobic/microoxic). It is obviously beyond the scope of this work to probe deeply into all of these proteins. The focus here was on cyAbrB2, and to a slightly lesser extent SigE; however, based on the data presented, it seems that SigA and cyAbrB1 may be equally important contributors to hox control/expression, and in the case of cyAbrB1, possibly also to chromosome conformation. cyAbrB1 appears to have the same binding sites as cyAbrB2, and has been reported to interact with cyAbrB2. Given this association, it is possible that the two proteins may affect the binding of each other, and that loss of one might lead to enhanced binding by the other (or binding may require heterooligomerization?). Probing the regulatory interplay between these two proteins (or at least discussing it) feels important. Conducting e.g. mobility shift assays with each protein, both individually and together, could possibly allow for some understanding of how they function together. 

      We agree that the biochemistry of cyAbrB2 and cyAbrB1 may explain why cyAbrB1 and cyAbrB2 bind long tracts of AT-rich genome regions in vitro. We would like to put the biochemistry future plan as we think biochemistry data is beyond the present study.

      The idea that cyAbrB1 and cyAbrB2 cooperate to form heterooligomers and broad binding to the genome is a very rational and interesting prediction. We add this idea to the discussion “Overall, the biochemistry integrating assay conditions (PTM, buffer condition, and cooperation with cyAbrB1) and output (DNA binding, oligomerization, and DNA structure) will deepen the understanding of cyAbrB2 as cyanobacterial NAPs.”(ll.287-290). We also compared our transcriptome of ∆_cyabrb2 with the recent study of cyabrb1 knockdown (ll. 162-165), and concluded “they partially share their target genes for suppression (Figures 3A S5C and S5D)” (l. 293).

      (3) Throughout the manuscript, there is reference made to cyAbrB2 binding becoming 'blurry' or non-specific under microoxic conditions. It is not clear what this means. It appears that when cyAbrB2 binds, any given protected region can be quite extensive, which can be suggestive of polymerization along the chromosome. Are the boundaries for binding sites typically clearly delineated, and this changes when the cultures are growing under microoxic conditions? There is also no mention made anywhere about oligomerization potential for cyAbrB2, which would be important for the polymerization, and bridging suggested for cyAbrB2 in the model presented in Figure 5. Previous publications (Song et al., 2022; Ishi et al., 2008) have suggested that it can exist as a dimer in vivo, but that in vitro it is largely monomeric. The manuscript would benefit from some additional biochemical analyses of cyAbrB2 binding activity, with a particular focus on DNA binding and oligomerization/bridging potential, and some additional discussion about these characteristics as well. 

      Throughout the manuscript, there is reference made to cyAbrB2 binding becoming 'blurry' or non-specific under microoxic conditions. It is not clear what this means.

      In order to clearly describe “cyAbrB2 binding becomes blurry”, we rearranged the figure composition and made an exclusive figure (Figure 5). We also rephrased the description by adopting the reviewer’s word “boundaries for binding sites”, as this phrase well describes the change. “When cells entered microoxic conditions, the boundaries of the cyAbrB2 binding region and cyAbrB2-free region became obscure (Figure 5), “(ll.319-320)

      There is also no mention made anywhere about oligomerization potential for cyAbrB2,

      We added the discussion about oligomerization “DNA-bound cyAbrB2 is expected to oligomerize, based on the long tract of cyAbrB2 binding region in our ChIP-seq data. However, no biochemical data mentioned the DNA deforming function or oligomerization of cyAbrB2 in the previous studies and preference for AT-rich DNA is not fully demonstrated in vitro (Dutheil et al. 2012; Ishii and Hihara 2008; Song et al. 2022)”(ll. 277-280) and “Overall, the biochemistry integrating assay conditions (PTM, buffer condition, and cooperation with cyAbrB1) and output (DNA binding, oligomerization, and DNA structure) will deepen the understanding of cyAbrB2 as cyanobacterial NAPs.” (ll.287-290)

      The manuscript would benefit from some additional biochemical analyses of cyAbrB2 binding activity, with a particular focus on DNA binding and oligomerization/bridging potential, and some additional discussion about these characteristics as well. 

      We added the discussion integrally considering known features of cyAbrB2, novel findings on cyAbrB2, and the comparison with known NAPs (ll.269-290).

      (4) Given that the major take-away for the authors (based on the title) seems to be the nucleoid-associated protein potential for cyAbrB2, the Discussion would benefit from some additional focus in this area. How similar is cyAbrB2 to other nucleoid-associated proteins? (e.g. H-NS, Lsr2) How does counter-silencing work for other nucleoid-associated proteins? Can the authors definitively exclude the possibility of binding site competition/occlusion, given that cyAbrB2 covers the promoter region of hox? What is other nucleoid-associated proteins have been characterized in the cyanobacteria? 

      We agree with the point, so we additionally discussed cyAbrB2 comparing with H-NS and Lsr2, the canonical NAPs (ll. 269-290).

      We did not deny the possibility of the exclusion of RNAP by cyAbrB2, but the previous manuscript insufficiently discussed that. To emphasize that cyAbrB2 excludes RNA polymerase, we simplified Figure 6 and employed mosaic plots showing anti-co-occurrence of cyAbrB2 binding regions and SigE peaks. Furthermore, we added discussion about SigE exclusion by cyAbrB2 (ll. 355-359)

      We mention the possibility of other nucleoid-associated proteins in cyanobacteria in the discussion. “Furthermore, the conformational changes by deletion of cyAbrB2 were limited, suggesting there are potential NAPs in cyanobacteria yet to be characterized.” (ll.336-339)

      (5) Previous work (Song et al., 2022) showed that changing the AT content of cyAbrB2 binding sites did not affect its ability to bind DNA. There are also previous papers suggesting that cyAbrB2 may be subject to diverse post-translational modifications (e.g. phosphorylation - Spat et al., 2023; glutationylation - Sakr et al., 2013), as well as association with cyAbrB1. These collectively suggest there may be other factors that contribute to cyAbrB2 binding specificity/activity. These seem like relevant points to discuss, particularly given the transient nature of the cyAbrB2 effects on some genes.

      We have included the discussion about AT content, post-translational modifications and transient regulations, and association with cyAbrB1 (ll. 284-295)

      (6) Given the major binding site for SigA upstream of the hox operon, it seems that it likely also contributes to hox cluster expression, together with SigE. Is there a sense for the relative contribution of each sigma factor to hox cluster expression? And whether both are subject to the same inhibitory effect of cyAbrB2? 

      As described above response to the public review, the SigA binding site upstream of the hox operon should be assigned to the TSS of TU1715 (Figure 6C). Transcription of hox operon is highly dependent on SigE as shown in Figure S2, and residual transcription in sigE∆ strain is derived from other sigma factors (SigABCD). Estimating the relative contribution of sigma factors other than SigE is difficult at present because SigABCDE can partially compensate for each other.

      As the different impact of NAPs on the primary and alternative sigma factor is observed in H-NS (Shin et al. 2005), whether both the primary sigma factor (SigA) and the alternative sigma factor (SigE) are inhibited by cyAbrB2 to the same extent is a very interesting question.

      We calculated the odds ratio of SigE and SigA being in the cyAbrB2-free region and wrote in the result; “SigE preferred the cyAbrB2-free region in the aerobic condition more than SigA did (Odds ratios of SigE and SigA being in the cyAbrB2-free region were 4.88 and 2.74, respectively).” (ll.193-195) and discussed “The higher exclusion pressure of cyAbrB2 on SigE may contribute to sharpening the transcriptional response of hox and nifJ on entry to microoxic conditions.” (ll.357-359)

      (7) The 3C experiments suggest there are indeed changes in chromosome architecture in the hox region as growth conditions change and when different regulators are present. Across the chromosome, analogous changes are expected; however, it may be premature to draw this conclusion based on changes at one locus. Is there a reason that the authors did not take full advantage of their 3C samples and sequence them, to capture the full chromosome interactome at the two time-points? This would allow broader conclusions to be drawn regarding changes in chromosome structure and the impact of cyAbrB2.

      In response to the suggestion, we performed an additional 3C assay on the nifJ region by utilizing residual 3C samples. Expanding to genome-wide sequence (Hi-C) needs concentration of ligated fragments by the biotinylation, which were omitted in our 3C sample.

      We rewrote the result as obtained from the 3C data of hox and nifJ (ll.220-245) and omitted the schematic image of an entire chromosome of cyanobacteria (previous Figure 5E).

      Editorial comments: 

      (1) The data presentation in Figure 1 is very effective. 

      (2) Line 87: please rephrase - you can have 'high similarity' or 'high levels of identity', but not high levels of homology - genes/proteins are either homologous or not.

      (3) Line 118: classified into four 'groups'? 

      (4) Line 590: remove 'the'. 

      (5) Figure 2S, panel B: please define acronyms in the legend (GT, IP) and write out 'FLAG' in full for AbrB1.

      (2) to (5) have been corrected.

      (6) Please provide information on or a reference for the tagging of SigA for use in the ChIP-seq experiments within the Materials and Methods.

      Added (l.365)

      (7) Line 648: space between 'binding' and 'regions'. 

      corrected.

      (8) Fig 4E: please make the solid lines thicker - they are currently difficult to see.

      We have made Figure 6C (former 4E) larger and the line thicker.

      (9) Line 666: location. 

      (10) Line 673: Individual. 

      (11) Figure S5, panel C graph title: should this be 'Relative'? 

      (12) Figure S7: What is 'GT'? Should this be 'WT'? 

      (9) to (12) have been corrected.

      (13) In addition to the data presented in Figure 3G, it would be nice to have a small table or Venn diagram summarizing the number of cyAbrB2 binding sites that fall into the different categories (full gene/operon; downstream of a gene; within a gene; promoter region). 

      In response to the comment, we noticed the categories we had applied (full gene/operon; downstream of a gene; within a gene; promoter region) were arbitrary. Therefore, we categorized transcriptional units (TUs) according to the extent of occupancy by cyAbrB2. (Figures 4B and 4C)

      (14) Line 280-281: suggest replacing 'mediates' with 'influences'. 'Mediates' sounds like a direct interaction (for which the evidence is not currently strong without some additional biochemical data), but 'influences' could better accommodate both direct and indirect possibilities. 

      (15) Line 410: it is not clear what this means. 

      We have omitted “As a result, DNA ~600-fold condensed DNA than 3C samples were ligated.”, as it does not give any information about the experimental procedure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors provided a detailed analysis of the real-time structural changes in actin filaments resulting from cofilin binding, using High-Speed Atomic Force Microscopy (HSAFM). The cofilin family controls the lifespan of actin filaments in the cells by severing the filament and promoting depolymerization. Understanding the effects of cofilin on actin filament structure is critical. It is widely acknowledged that cofilin binding significantly shortens the pitch of the actin helix. The authors previously reported (1) that this shortening extends to the unbound region of the actin filament on the pointed end side of the cluster. In this study, the authors presented substantially improved AFM images and provide detailed accounts of the dynamics observed. It was found that a minimal cofilin-binding cluster, consisting of 2-4 molecules, could induce changes in the helical parameters over one or more actin crossover repeats. Adjacent to the cofilin-binding clusters, the actin crossovers were observed to shortened within seconds, and this shortening was limited to one side of the cluster. Additionally, the phosphate binding to the actin filament was observed to stabilize the helical twist, suggesting a mechanism in which cofilin preferentially binds to ADP-bound actin filaments. These findings significantly advance our understanding of actin filament dynamics which is essential for a wide of cellular processes.<br /> However, I propose that the sections about MAD and certain parts of the discussions need substantial revisions.

      In this study, we leverage high spatiotemporal resolutions of high-speed atomic force microscopy (HS-AFM) to analyze real-time structural changes in actin filaments induced by cofilin binding. Furthermore, we experimentally demonstrate the inherent variability in twist conformations of bare actin filaments. Our study integrates HS-AFM with Principal Component Analysis (PCA) to elucidate the actin structure-dependent preferential cooperative binding of cofilin. We provide experimental evidence to substantiate a "proof of principle" regarding the flexible helical twists of actin filaments that regulate the functions of actin-binding proteins. This important study enhances our understanding of actin filaments’ dynamics and polymorphic structures which play crucial roles in a broad spectrum of cellular activities.

      We appreciate the comments from Reviewer 1. Below, we address their concerns point by point.

      MAD analysis

      The authors have presented findings that the mean axial distance (MAD) within actin filaments exhibits a significant dependency on the helical twist, a conclusion not previously derived despite extensive analyses through electron microscopy (EM) and molecular dynamics (MD) simulations. Notably, the MAD values span from 4.5 nm (8.5 pairs per half helical pitch, HHP) to 6.5 nm (4.5 pairs/HHP) as depicted in Figure 3C. The inner domain (ID) of actin remains very similar across C, G, and F forms (2, 3), maintaining similar ID-ID interactions in both cofilactin and bare actin filaments, keeping the identical axial distance between subunits in the both states. This suggests that the ID is unlikely to undergo significant structural changes, even with fluctuations in the filament's twist, keeping the ID-ID interactions and the axial distances. The broad range of MAD values reported poses a challenge for explanation. A careful reassessment of the MAD analysis is recommended to ensure accuracy.

      The central challenge to study “Protein Dynamics” in real time lies in bridging the gap in time scales: HS-AFM captures dynamics of proteins within the milliseconds to seconds range, whereas molecular dynamics (MD) simulations typically operate within the femtoseconds to microseconds domain. Protein dynamics encompass a spectrum of temporal scales, from atomic vibrations to molecular tumbling and collective motions in simulations. HS-AFM stands out as a potent technique for delving into protein dynamics, including processes like protein folding and conformational changes triggered by drugs or protein interactions. Additionally, a significant limitation of MD simulation is the spatial modeling constraint (~50 x 50 nm unit), which restricts the study of large complex biological systems. However, utilizing HS-AFM enables the construction of intricate protein models facilitating the real time imaging of their structures and dynamics during functional activity.

      Regarding the suggestion about ID-ID interactions in both cofilactin and bare actin filaments, maintaining identical axial distances (ADs) between subunits in both states, our HS-AFM cannot provide atomic-level structural insights to address this issue. However, we demonstrate that the variability of OD twists in actin protomers could potentially lead to globally shorter half helical pitches (HHPs) and fewer protomer pairs per HHP (Figure 2, Figure supplement 2) (see lines 218-222). The fluctuation in filament’s twist is further supported by currently available experimental data, including our findings (Figure 3C) in this study (see our Discussion in lines 555-560).

      The minimal change in local ID-ID interactions results in an unchanged global length of actin filaments in both cofilin-bound and unbound cases (Figure supplement 2). However, filament’s twists, as experimentally detected by EM, high-resolution interferometric scattering microscopy (iSCAT), HS-AFM, and in pseudo AFM, are changeable (see lines 555-560).

      We have additionally reassessed the fluctuation and dynamics of MAD in F-ADP-actin and F-ADP.Pi-actin over time at high temporal resolution (Figure supplement 3, Video 3, Table supplement 5). These data are further explained in the Results section (lines 264-270).

      Furthermore, we reassessed the broad range of MAD values in F-ADP-actin segments on both sides of large cofilin clusters over time (Figure supplement 8, Video 5). These findings are explained in the Results section (lines 333-337) and further discussed in the new results (lines 555-560).

      In determining axial distances, the authors extracted measurements from filament line profiles. It is advised to account for potential anomalies such as missing peaks or pseudo peaks, which could arise from noise interference. An example includes the observation of three peaks in HHP6 of Figure Supplement 5C, corresponding to 4.5 pairs. Peak intervals measured from the graph were 5, 11.8, 8.7, and 5.7 nm. The second region (11.8 nm) appears excessively long. If one peak is hidden in the second region, the MAD becomes 5.5 nm.

      We acknowledge the difficulty in identifying peaks within the regions of bare actin segments adjacent to cofilin clusters or within the cofilactin region. In the revised Figure supplement 6C (originally Figure supplement 5C), we did not assess peak intervals as suggested by Reviewer 1. The measurement of axial distance (AD) and the number of peaks within a HHP to calculate the correct MAD is further detailed in the Methods section (see HS-AFM data analysis and processing, highlighted in purple).

      Additionally, the purpose of presenting these Figures supplement 6-7 is to directly compare the half helices and the number of protomer pairs per HHP between bare actin filaments and actin segments near the boundary between cofilactin and bare actin segments on the PE side in the same AFM images. In an original version of this paper, we have avoided including the MAD values measured in the cofilactin region (HHP6, HHP7) in Figure Supplement 7E, to mitigate the measurement errors.

      Compiling histograms of axial distances (ADs) rather than focusing solely on MAD may provide deeper insights. If the AD is too long or too short, the authors should suspect the presence of missing peaks or pseudo-peaks due to noise. If 4.4 or 5.5 pairs/HHP regions tend to contain missing peaks and 7.5-8.5 pairs/HHP regions tend to contain pseudo peaks, this may explain the MAD dependency on the helical twist.

      The measurement of axial distance (AD) and the number of peaks within a HHP to calculate the correct MAD is further detailed in the Methods section (see Analyses of pseudo AFM images of F-actin and C-actin structures constructed from existing PDB structures (e.g., Figure supplement 2); and HS-AFM data analysis and processing, highlighted in purple).

      We disagree with Reviewer 1’s suggestion that compiling histograms of ADs, rather than focusing solely on MAD, may provide deeper insights. AFM imaging provides only a 2-dimensional (2D) surface structure, unlike the 3-dimensional (3D) structure offered by Cryo-EM. In AFM imaging, we cannot capture the object from different angles as Cryo-EM does. Therefore, AD values measured in 2D AFM images do not accurately represent the axial distance between two adjacent protomers along the same actin filament. Consequently, we relied on MAD values. Our results, including the fluctuation in the number of protomer pairs per HHP, are further supported by other studies (see our Discussion in lines 555-560).

      Additionally, Figure 3E indicates a first decay constant of 0.14 seconds, substantially shorter than the frame rate (0.5 sec/frame). This suggests significant variations in line profiles between frames, attributable either to overly rapid dynamics or a low signal-to-noise ratio. Implementing running frame averages (of 2-3 frames) is recommended to distinguish between these scenarios. If the dynamics are indeed fast, the averaged frame's line profile may degrade, complicating peak identification. Conversely, if poor signal-to-noise ratio is the cause, averaging frames could facilitate peak detection. In the latter case, the authors can find the optimal number of frame averages and obtain better line profiles with fewer missing and pseudo-peaks.

      We utilized state-of-the-art HS-AFM with high temporal and spatial resolution to capture the dynamic structures of F-ADP-actin and F-ADP.Pi-actin segments at higher frame rate of 0.2 sec/frame and 0.1 sec/frame, respectively (Figure supplement 3). As suggested, we implemented running frame averages (3 frames) in the ACF analyses. Consistently, our results indicate that the first time constant (t1) remains around 0.1-0.4 seconds, independent of the imaging rates (0.1 – 0.5 sec/frame), for AD between two adjacent actin protomers in F-actin bound with ADP or ADP.Pi (Table Supplement 5), and in the similar range of (t1), shown in Figure 3E. These significant experimental results support the notion that helical twists, the number of actin protomers per HHP, and MAD in bare F-actin segments, are intrinsically dynamic and fluctuate around the mean values over time (see further in lines 264-270; 333-337; and 555-560). It should be noted that our original ACF analyses did not include the averaging of running frames, thus eliminating the possibility of low signal/noise ratio in our analysis, as shown in Figure 3E-F.

      Discussions

      The authors suggest a strong link between the C-form of actin and the formation of a short pitch helix. However, Oda et al. (3) have demonstrated that the C-form is highly unstable in the absence of cofilin binding, casting doubt on the possibility of the C-form propagating without cofilin binding. Moreover, in one strand of the cofilactin, interactions between actin subunits are limited to those between the inner domains (ID-ID interactions), which are quite similar to the interactions observed in bare actin filaments. This similarity implies that ID-ID interactions alone are insufficient to determine the helical parameters, suggesting that the presence of cofilin is essential for the formation of the short pitch helix in the cofilactin filament. Thus, crossover repeats are not necessarily shortened even if the actin form is C-form.

      We have experimentally observed a shortened bare half helix adjacent to cofilin clusters on the PE side at high spatial resolution, comprising fewer protomers than normal half helices. Thus, we hypothesized that crossover repeats are shortened if the actin protomers in the bare half helix neighboring the cofilin cluster on the PE side resembles a C-actin structure. This assumption is further explained by referring to C-actin structure in Figure 2 and Figure supplement 2. Even though the C-form, as suggested in Oda et al., 2019, is unstable, it intrinsically fluctuates around the mean value over time and adopts various conformations. A single PDB structure resolved by Cryo-EM through the ensembles of averaging structural images should be referenced as a single atomistic structure, one of many possible conformations, regardless it is resolved by Cryo-EM, X-ray diffraction or crystallography, or NMR (see Figure 1, legend of Figure supplement 1).

      We highlight two main points regarding this issue: (1) The short helical pitch at the global scale is associated with the twisting of the OD at the local scale for individual protomers; (2) Actins in different nucleotide or cofilin bound states exhibit varying ranges, distributions, spectra, variations of both local OD twist and global helical pitch (Figure 1-2, Figure supplement 1-2). The first point underscores that the twist/untwist of the OD determines the shortness of the helical pitches, rather than the ID-ID interactions. The latter point is more related to the global length of the filament. The minimal change in local ID-ID interactions results in an unchanged global length of actin filaments in both cofilin-bound and unbound cases (see pseudo AFM images in Figure supplement 2 for canonical actin filament and cofilactin segments with the same length (comprising 62 protomers). However, filament’s twists, as experimentally detected by EM, high-resolution interferometric scattering microscopy (iSCAT), HS-AFM, and in pseudo AFM, are changeable (see lines 555-560) and independent on the ID-ID interactions.

      Narita (4) proposes that the facilitation of cofilin binding may occur through a shortening in the helix pitch, independent of a change to the C-form of actin. Furthermore, the dissociation of the D-loop from an adjacent actin subunit leads directly to the transition of actin to the G-form, which is considered the most stable configuration for the actin molecule (3).

      See also our explanation above. We have incorporated these points in a Discussion section. See lines 497-499; 510-511.

      Furthermore, our PCA analysis indicates that the transition from C-actin to G-actin necessitates the opening of the nucleotide cleft (resulting in a decrease in PC1) and is more readily achieved than the direct transition from F-actin to G-actin (which requires decreases in both PC1 and PC2). Whether this transition is directly triggered by the dissociation of the D-loop remains a topic for our future investigations. Our PCA analysis reveals that the D-loop is deeply buried within the core of the filament (Figure 2). Further experiments will be conducted to elucidate its roles.

      The mechanism by which the shortened pitch propagates remains a critical and unresolved issue. It appears that this propagation is not a result of the C-form's propagation but likely involves an unidentified mechanism. Identifying and understanding this mechanism represents an essential direction for future research.

      It's worth mentioning that our HS-AFM data and spatial ACF analysis lend support to a hypothesis suggesting that 2-4 bare actin protomers adjacent to cofilin clusters on the PE side adopt C-actin-like structures. Additionally, we have proposed several hypotheses aimed at better understanding the mechanisms driving the unidirectional binding and expansion of cofilin clusters toward the PE side. These hypotheses will require further examination in future experiments. Additional information can be found in lines 328-329; 344-351; and 416-430.

      (1) K. X. Ngo et al., a, Cofilin-induced unidirectional cooperative conformational changes in actin filaments revealed by high-speed atomic force microscopy. eLife 4, (2015).<br /> (2) K. Tanaka et al., Structural basis for cofilin binding and actin filament disassembly. Nature communications 9, 1860 (2018).<br /> (3) T. Oda et al., Structural Polymorphism of Actin. Journal of molecular biology 431, 3217-3228 (2019).<br /> (4) A. Narita, ADF/cofilin regulation from a structural viewpoint. Journal of muscle research and cell motility 41, 141-151 (2020).

      We have cited them accordingly in the paper.

      Reviewer #2 (Public Review):

      Summary:

      This study by Ngo et al. uses mostly high-speed AFM to estimate conformational changes within actin filaments, as they get decorated by cofilin. The authors build on their earlier study (Ngo et al. eLife 2015) where they used the same technique to monitor the expansion of cofilin clusters on actin filaments, and the propagation of the associated conformational changes in the filament (reduction of the helical pitch). Here, they propose a higher-resolution description of the binding of cofilin to actin filaments.

      Strengths:

      The high speed AFM technique used here is quite original to address this question, compared to classical light and electron microscopy techniques. It can certainly bring valuable information as it provides a high spatial resolution while monitoring live events. Also, in this paper, a nice effort was made to make the 3D structures and conformational changes clear and understandable.

      We are grateful for the positive feedback from Reviewer 2.

      Weaknesses:

      The paper also has a number of limitations, which I detail below.

      In addition to AFM, the authors also propose a Principal Component Analysis (PCA) of exisiting structural data on actin protomers. However, this part seems very similar to another published work by others (Oda et al. JMB 2019), which is not even cited.

      We addressed this issue and explained it in Methods section, lines 612-621.

      The asymmetrical growth of cofilin clusters has so far only been seen using AFM, by the same authors (Ngo et al. eLife 2015). Using fluorescent microscopy, others have reported a very symmetrical expansion of cofilin clusters (Wioland et al. Curr Biol 2017). This is not mentioned at all, here. It should be discussed, and explanations for this discrepancy could be proposed.

      We have cited this paper (Wioland et al. Curr Biol 2017) in the current manuscript (see lines 361-362). However, we are unable to evaluate the technical distinctions between our methods and theirs. Instead, we have referred to a more recent paper that employed similar techniques to those used by Wioland et al. in Current Biology 2017. Our findings align with those reported by Bibeau JP et al. in the Journal of Molecular Biology 2021 (see their Results on page 7, titled “Cofilin clusters elongate preferentially towards the actin filament pointed end”. At the minimum, we believe this is appropriate.

      Regarding the AFM technique, I have the following concerns.

      The filaments appear densely packed on the surface, and even clearly in register in some images (if not most images, e.g., Figs 3A, 4BC, 5A). Why is that? Isn't there a risk that this could affect the result? This suggests there is some interaction between the filaments.

      In this study, as well as in many similar studies of actin filaments alone or in interaction with other actin binding proteins (ABPs) including cofilin, we have carefully considered the density of filaments when designing experiments. We used highly dense, but not packed, actin filaments to minimize free space between filaments and the surface, which helps maintain stable tip-scanning during AFM imaging. This strategy technically allows us to capture high spatial and temporal resolutions of actin filaments’ structures.

      The actin filaments, resemble paracrystal structures, are represented as densely packed actin filaments (see our data in Ngo and Kodera et al., eLife 2015, Figure 1C). Thus, the data presented in this paper is technically appropriate and does not risk misinterpretation due to lateral interactions impacting the structures and function of actin filaments and cofilin.

      The properties of the lipid layer and its interaction with the actin filaments are not clear at all. A poor control of these interactions is a problem if one aims to measure conformational changes at high resolution. The strength of the interaction appears tuned by the ratio of lipids put on the surface to change its electrostatic charge. A strong attachement likely does more than suppress torsional motion (as claimed in Fig 8A). It may also hinder cofilin binding in several ways (lower availability of binding sites on the filament facing the surface, electrostatic interactions between cofilin and the surface, etc.)

      We are confident that our lipid membrane bilayer is the optimal choice for immobilizing actin filaments in a controlled manner for HS-AFM experiments, achieved through the variation of positively charged lipids. In this study, we have fine-tuned the surface charge for our specific purposes.

      As an example, to capture high-spatial resolution images of actin structures (Figure 5-6, Figure supplement 5B, 6), we strongly fixed the filaments on DPPC/DPTAP (50/50 wt%) after the binding reaction between actin filaments and cofilin in solution was completed. This experiment yielded valuable information, including: (i) the ability to replicate the conformation of cofilactin and hybrid cofilactin/bare actin segments in solution, akin to the first steps in sample preparation for Cryo-EM techniques; and (ii) the capability to capture these structures, reflecting their solution states, by firmly fixing them on a lipid surface. On the lipid surface, these structures were retained stably during AFM imaging.

      If there is a choice, we advise against using amino-silane and other positively charged polymers typically used for modifying glass surfaces to fix actin filaments in studies using fluorescence microscopy. The strong immobilization by these chemicals can alter the structural dynamics and functions of actin filaments, lead to non-specific binding of cofilin on the modified glass surface, and potentially affect data interpretation.

      On a local scale, the reviewer may argue about the "lower availability of binding sites on the filament facing the surface". However, on a global scale, we maintain that two single strands forming helical twists of long F-actin segments should have an equal chance to bind cofilin even when fixed on a lipid membrane. The evidence shown in Figure 8A and Video 7, which demonstrates that small cofilin clusters associate and dissociate locally without developing into large clusters along the actin filament, supports our conclusion that flexibility and dynamics in helical twists plays a crucial role in facilitating the binding and growth of cofilin clusters.

      The lipid surface utilized in our study with actin filaments and cofilin provides an ideal surface, as it is flat and minimizes the nonspecific binding of cofilin to the lipid membrane (see an example of the lipid surface in Video 5).

      How do we know that the variations over time are not mostly experimental noise, i.e. variations between repeats of the same measurement? As shown in Fig 3, correlation is mostly lost from one image to the next, and rather stable after that.

      This question is similar to the above question of Reviewer 1. Please also refer to our response in lines 264-270; 333-337; 555-560, measurement Methods, and Figure supplement 3 and Table supplement 5.

      The identification of cofilactin regions relies on the additional height of the "peaks", due to the presence of cofilin. It thus seems that cofilin is detected every half helical pitch (HHP), but not in between, thereby setting the resolution for the localization of cluster borders to one HHP. It thus seems difficult to claim that there is a change in helicity without cofilin decoration over this distance. In Fig 7, the change in helicity could be due to cofilin decoration that is undetected because cofilins have not yet reached the next peak.

      There are several important criteria to distinguish the "supertwisted half helix" in cofilactin region from the "normal half helix". As illustrated in the pseudo AFM images constructed for normal F-actin and C-actin segments (with and without cofilin decoration) from PDB structures, it is evident that these two structures differ significantly in length and the number of protomer pairs per HHP (see Figure Supplement 2). In both pseudo and experimental AFM images, these parameters can be easily detected by measuring the distance between two cross-over points. Furthermore, the height or thickness difference between the cofilactin and bare actin regions is approximately 10-15 Å, which is well resolved by HS-AFM due to its exceptional z-axis resolution of ~1 Å. Technically, we were able to detect these differences by creating a longitudinal section profile that covered both bare actin and cofilactin areas, as shown in Figure supplement 6.

      We experimentally reveal that a critical cofilin cluster comprising 2-4 molecules (Figures 5-6) or larger cofilin clusters (Figures 7-8, Figure Supplements 6-8) could equally supertwist a bare half helix on the PE side. The observation that a small cofilin cluster (2-4 molecules) can shorten a half helix by reducing number of protomers per HHP to 9 or 11 (4.5 or 5.5 protomer pairs), which typically requires full decoration by 9-11 cofilin molecules, strongly suggests that supertwisting or the change in helicity does not always require complete cofilin decoration. We predicted that 2-4 bare actin protomers neighboring a cofilin cluster on the PE side can adopt the C-actin-like structure. See further in lines 324-329.

      Figure 7 captures a live binding event of cofilin at low spatial resolution, yet (i) the half helical pitches and (ii) the thickness of the cofilactin and bare actin segments can still be clearly distinguished. This demonstrates that changes in helicity within the cofilactin region propagate to an unbound half helix on the PE side, rearranging the helical twist by reducing the number of actin protomers per HHP, prior to recruiting additional cofilin for binding and expanding clusters.

      Reviewer #1 (Recommendations For The Authors):

      I believe C-form and G-form are better than C-actin like structure or G-actin like structure.

      We avoid using terms like "G-form", "F-form", or "C-form", as defined by Cryo-EM (Oda et al., 2019), because they refer to specific nucleotide and cofilin-bound states in other original papers. Instead, we use “G-actin”, “F-actin”, “C-actin”, “G-actin-like”, and “C-actin-like” to emphasize "Structural Dynamics" and "Structural Polymorphism". This highlights that even F-actin structures without cofilin bound can adopt "C-actin-like" conformations with fewer OD twists, resulting in a shorter global helical pitch. ADP-bound F-actins exhibit greater variability in helical twists than ADP-Pi-bound F-actin (Figure 9), indicating that ADP-bound F-actin protomers can adopt more C-actin-like conformations than ADP-Pi-bound F-actin protomers (Figure 1, Figure supplement 1).

      Technical terms describing actin structures do not need to be the same between Cryo-EM and HS-AFM, as the two techniques are fundamentally different. Our work underscores the importance of considering “structural dynamics and heterogeneity” in different nucleotide states of filamentous actin structures, both with and without cofilin, over time.

      Figure 1A

      A very similar analysis has already been performed by Oda et al (1). The authors should describe the relationships with the previous analysis.

      We addressed this issue in Methods – Principal component analysis – in lines 612-621.

      Figure 1B, C

      A very similar analysis has already been performed by Tanaka et al. (2). The authors should describe the relationship with the previous analysis.

      We addressed this issue in Methods – Principal component analysis – in lines 612-621 and legend of Figure 1.

      Lines 397-398

      "However, we noted that in rare instances, cofilin clusters also grew on both sides in the regular bare half helices when ATP or ADP was present."

      I believe other experiments also contain ATP in the solution. I could not catch the meaning of this sentence.

      We addressed this issue in the Results section, line 412. "However, we noted that in rare instances, cofilin clusters also grew on both sides in the regular bare half helices when only ADP was present."

      Additionally, we enhanced the description in the Methods section to avoid any confusion regarding nucleotides in the buffer. Please refer to the Methods section under “HS-AFM imaging”, lines 702-738.

      Lines 427-429

      "Consequently, the proportion of naturally supertwisted half helices with HHPs shorter than 30 nm was 5.8% for F-ADP-actin but only 1.1% and 0.2% for F-ADP.Pi-actin and phalloidin-stabilized F-actin, respectively."<br /> Similar discussion was made in (3) for the actin filaments with tension. It might be comparable with the current data.

      We cited it accordingly, line 447 for Okura et al., 2023.

      Lines 553-557

      "Nonetheless, it remains plausible that the structural flexibility exhibited 553 by ADP-bound actin protomers could result in subtle variations in the conformations of the DNase binding loop (Dloop) G46-M47-G48-N49, as suggested in (Chou and Pollard, 2019). We suggest that the absence of bound Pi possibly increases the torsional flexibilities during helical twisting of ADP bound actin filaments in contrast to their ADP.Pi-bound counterparts."

      The crystal structure of the F-form (4) showed that Pi in ADP.Pi connects the two large domains of the actin molecule, stabilizing F-form. Pi release largely weakens the connection. This might be useful for the discussion.

      We incorporated this point with the suggested citation in lines 582-584.

      (1) T. Oda et al., Structural Polymorphism of Actin. Journal of molecular biology 431, 3217-3228 (2019).

      (2) K. Tanaka et al., Structural basis for cofilin binding and actin filament disassembly. Nature communications 9, 1860 (2018).

      (3) K. Okura et al., Mechanical Stress Decreases the Amplitude of Twisting and Bending Fluctuations of Actin Filaments. Journal of molecular biology 435, 168295 (2023).

      (4) Y. Kanematsu et al., Structures and mechanisms of actin ATP hydrolysis. Proceedings of the National Academy of Sciences of the United States of America 119, e2122641119 (2022).

      Reviewer #2 (Recommendations For The Authors):

      Line 190: "Noticeably, PCA analysis revealed higher structural flexibility in F-ADP-actin (red dots), exploring a larger space than F-ADP-Pi-actin structures (orange dots) within the F-actin cluster (inset in Figure 1A)". Is there a quantification to support this claim? Visually, things are not so clear.

      We have improved Figure 1 by adding 2 circles to an inset, providing clearer quantification to support our claim.

      In the PCA part: isn't it a bit obvious, or at least expected, that the conformation adopted by actin in the cofilactin structure is the most favorable one for binding cofilin?

      We agree this point with the reviewer and have added this point accordingly in the Results section, lines 202-204.

      I found it a bit unclear how the structures in Fig 2 were obtained.

      We further explained it by adding “Zoom-in views of these long filaments are shown in Figure 2” in Methods section, line 661.

      In the AFM images, the authors always seem to know the polarity of the filaments. Unless I missed it, how they know this is not explained. In their earlier work (Ngo et al. 2015) they used a subfragment of myosin II which indicates polarity when bound to F-actin. I found no such explanation here.

      We have addressed this issue in the legend of each figure accordingly.

      For clarity, I suggest writing "C-actin-like structures" (with two hyphens) rather than "C-actin like structures".

      We agree and are currently incorporating this change in the text.

      The term "cluster" in PCA can be confusing because it is used for cofilin clusters throughout the text.

      "Cluster" is a common term used in PCA analysis. To clarify, we revised the legend in Figure 1 and Figure Supplement 1, changing "PCA clusters" to distinguish them from “cofilin clusters” or “F-actin clusters”.

      There are many acronyms. Readibility of the figure legends (which can be consulted independently from the main text) would be improved if acronyms were explicited there as well.

      We have revised some of the acronyms in the legend of each figure accordingly. At the minimum, we believe it is appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript builds upon the authors' previous work on the cross-talk between transcription initiation and post-transcriptional events in yeast gene expression. These prior studies identified an mRNA 'imprinting' phenomenon linked to genes activated by the Rap1 transcription factor (TF), a surprising role for the Sfp1 TF in promoting RNA polymerase II (RNAPII) backtracking, and a role for the non-essential RNAPII subunits Rpb4/7 in the regulation of mRNA decay and translation. Here the authors aimed to extend these observations to provide a more coherent picture of the role of Sfp1 in transcription initiation and subsequent steps in gene expression. They provide evidence for (1) a physical interaction between Sfp1 and Rpb4, (2) Sfp1 binding and stabilization of mRNAs derived from genes whose promoters are bound by both Rap1 and Sfp1 and (3) an effect of Sfp1 on Rpb4 binding or conformation during transcription elongation. 

      Strengths: 

      This study provides evidence that a TF (yeast Sfp1), in addition to stimulating transcription initiation, can at some target genes interact with their mRNA transcripts and promote their stability. Sfp1 thus has a positive effect on two distinct regulatory steps. Furthermore, evidence is presented indicating that strong Sfp1 mRNA association requires both Rap1 and Sfp1 promoter binding and is increased at a sequence motif near the polyA track of many target mRNAs. Finally, they provide compelling evidence that Sfp1-bound mRNAs have higher levels of RNAPII backtracking and altered Rpb4 association or conformation compared to those not bound by Sfp1. 

      Weaknesses: 

      The Sfp1-Rpb4 association is supported only by a two-hybrid assay that is poorly described and lacks an important control. Furthermore, there is no evidence that this interaction is direct, nor are the interaction domains on either protein identified (or mutated to address function). 

      Indeed, our two hybrid, immunoprecipitation and imaging results do not allow us to conclusively discern whether the interaction between Rpb4 and Sfp1 is direct or indirect. While the interaction holds significance, we consider the direct versus indirect distinction to be of secondary importance in the context of this paper. In the current text we indicated that 'our two hybrid, immunoprecipitation and imaging results do not differentiate between a direct or indirect interactions' (see page 6, sentences highlighted in blue)

      The contention that Sfp1 nuclear export to the cytoplasm is transcription-dependent is not well supported by the experiments shown, which are not properly described in the text and are not accompanied by any primary data. 

      This section has been re-written for better clarity (see page 7). We note that this assay was originally developed and published by Lee, M. S., M. Henry, and P. A. Silver in their 1996 paper in G&D and has since been reported in numerous subsequent studies. Reassuringly, our conclusion is bolstered by the observation that Sfp1 binds to Pol II transcripts co-transcriptionally, suggesting that Sfp1 is exported in the context of the mRNA.

      The presence of Sfp1 in P-bodies is of unclear relevance and the authors do not ask whether Sfp1-bound mRNAs are also present in these condensates. 

      P-bodies consist of both RNA and proteins (reviewed in doi: 10.1021/acs.biochem.7b01162). The significance of this experiment lies in its contribution to further confirming the co-localization of Sfp1 with mRNAs and Rpb4. This observation could also yield valuable insights for future investigations into the role of Sfp1.

      Further analysis of Sfp1-bound mRNAs would be of interest, particularly to address the question of whether those from ribosomal protein genes and other growth-related genes that are known to display Sfp1 binding in their promoters are regulated (either stabilized or destabilized) by Sfp1. 

      Fig. 4A, C and D show that RP mRNAs become destabilized in sfp1Δ cells.

      The authors need to discuss, and ideally address, the apparent paradox that their previous findings showed that Rap1 acts to destabilize its downstream transcripts, i.e. that it has the opposite effect of Sfp1 shown here. 

      We would like to thank Reviewer 1 for this valuable comment. In the revised paper, we delved into our hypothesis suggesting that Rap1 is likely responsible for regulating the imprinting of other proteins, that, in turn, lead to the destabilization of mRNAs, such as Rpb4. See blue paragraph in page 20.

      Finally, recent studies indicate that the drugs used here to measure mRNA stability induce a strong stress response accompanied by rapid and complex effects on transcription. Their relevance to mRNA stability in unstressed cells is questionable. 

      Half-lives were determined mainly by the GRO analysis of optimally proliferating cells. This  method does not requires any drug or stressful treatment.  The results obtained by this method were consistent with those obtained after thiolutin addition. Using both methods, we discovered that disruption of Sfp1 results in substantial mRNA destabilization. Nevertheless, in our revised manuscript, we show results obtained by subjecting cells to a temperature shift to 42°C, a natural method to inhibit transcription. This approach to determine half-lives has been previously reported in our publications, such as Lotan et al. (2005, 2007) and Goler Baron et al. (2008). This may rule out effects of the drug on half-lives. Indeed, this assay clearly determine HL under heat stress. Thus it can clearly demonstrate that, at least during heat shock, Sfp1 stabilizes mRNAs. Since the results are similar to those obtained by the GRO method at 30oC, we concluded that Sfp1 stabilizes mRNA under optimal and hot conditions.

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Kelbert et al. presents results on the involvement of the yeast transcription factor Sfp1 in the stabilisation of transcripts whose synthesis it stimulates. Sfp1 is known to affect the synthesis of a number of important cellular transcripts, such as many of those that code for ribosomal proteins. The hypothesis that a transcription factor can remain bound to the nascent transcript and affect its cytoplasmic half-life is attractive, but the methods used to demonstrate the half-life effects and the association of Sfp1 with cytoplasmic transcripts remain to be fully validated, as explained in my comments on the results below: 

      Comments on methodology and results: 

      (1) A two-hybrid-based assay for protein-protein interactions identified Sfp1, a transcription factor known for its effects on ribosomal protein gene expression, as interacting with Rpb4, a subunit of RNA polymerase II. Classical two-hybrid experiments depend on the presence of the tested proteins in the nucleus of yeast cells, suggesting that the observed interaction occurs in the nucleus. Unfortunately, the two-hybrid method cannot determine whether the interaction is direct or mediated by nucleic acids. 

      Indeed, our two hybrid, immunoprecipitation and imaging results do not allow us to conclusively discern whether the interaction between Rpb4 and Sfp1 is direct or indirect. While the interaction holds significance, we consider the direct versus indirect distinction to be of secondary importance in the context of this paper. In the current text we indicated that 'our two hybrid, immunoprecipitation and imaging results do not differentiate between a direct or indirect interactions' (see page 6)

      (2) Inactivation of nup49, a component of the nuclear pore complex, resulted in the redistribution of GFP-Sfp1 into the cytoplasm at the temperature non-permissive for the nup49-313 strain, suggesting that GFP-Sfp1 is a nucleo-cytoplasmic shuttling protein. This observation confirmed the dynamic nature of the nucleo-cytoplasmic distribution of Sfp1. For example, a similar redistribution to the cytoplasm was previously reported following rapamycin treatment and under starvation (Marion et al., PNAS 2004). In conjunction with the observation of an interaction with Rpb4, the authors observed slower nuclear import kinetics for GFP-Sfp1 in the absence of Rpb4 when cells were transferred to a glucose-containing medium after a period of starvation. Since the redistribution of GFP-Sfp1 was abolished in an rpb1-1/nup49-313 double mutant, the authors concluded that Sfp1 localisation to the cytoplasm depends on transcription. The double mutant yeast cells may show a variety of non-specific effects at the restrictive temperature, and whether transcription is required for Sfp1 cytoplasmic localisation remains incompletely demonstrated. 

      We agree with Reviewer 2 that any heat inactivation of a temperature-sensitive (ts) protein can lead to non-specific effects. It is evident that nup49-313 does not prevent Sfp1 export to the cytoplasm. In the case of rpb1-1, these non-specific effects are expected due to transcriptional arrest, which can eventually result in a reduction in protein content. However, this process takes some time, while the impact on export is more rapid. It is worth noting that this assay was developed and previously published by Pam Silver (Henry and Silver G&D 1996) and has been reported in many subsequent papers. Importantly, our conclusion is supported by the observation that Sfp1 binds both nascent RNA (co-transcriptionally) and mature mRNA (cytoplasmic). These observations, along with the reduced mRNA export upon transcription blocking, are consistent with our proposal that Sfp1 is exported in association with mRNA.

      (3) Under starvation conditions, which led to the presence of Sfp1 in the cytoplasm and have previously been correlated with a decrease in the transcription of Sfp1 target genes, the authors observed that a plasmid-based expressed GFP-Sfp1 accumulated in cytoplasmic foci. These foci were also labelled by P-body markers such as Dcp2 and Lsm1. The quality of the microscopic images provided does not allow to determine whether Rpb4-RFP colocalises with GFP-Sfp1. 

      The submitted PDF figure is of low quality. We believe that high quality figure of the final submission is convincing. 

      (4) To understand to which RNA Sfp1 might bind, the authors used an N-terminally tagged fusion protein in a cross-linking and purification experiment. This method identified 264 transcripts for which the CRAC signal was considered positive and which mostly correspond to abundant mRNAs, including 74 ribosomal protein mRNAs or metabolic enzyme-abundant mRNAs such as PGK1. The authors did not provide evidence for the specificity of the observed CRAC signal, in particular, what would be the background of a similar experiment performed without UV cross-linking. In a validation experiment, the presence of several mRNAs in a purified SFP1 fraction was measured at levels that reflect the relative levels of RNA in a total RNA extract. Negative controls showing that abundant mRNAs not found in the CRAC experiment were clearly depleted from the purified fraction with Sfp1 would be crucial to assessing the specificity of the observed protein-RNA interactions. The NON-CRAC+ selected mRNAs were enriched for genes whose expression was previously shown to be upregulated upon Sfp1 overexpression (Albert et al., 2019). The presence of unspliced RPL30 pre-mRNA in the Sfp1 purification was interpreted as a sign of co-transcriptional assembly of Sfp1 into mRNA, but in the absence of valid negative controls, this hypothesis would require further experimental validation.

      We would like to thank Reviewer 2 for bringing this issue up, as it helped us to clarify it in the revised paper.

      First, we emphasized in the Discussion that many CRAC+ genes do not fall into the category of highly transcribed genes. Please see more detailed discussion below.

      Secondly, we examined various features of the 264 genes - classified as CRAC+ - to estimate their specificity and biological significance. Our various experiments revealed that the CRAC+ genes represent a distinct group with many unique features.

      The biological significance of the 264 CRAC+ mRNAs was demonstrated by various experiments; all are inconsistent with technical flaws. In fact, all the experiments and analyses that we have pursued indicate the unique nature of the CRAC+ genes. Some examples are:

      (1) Fig. 2a and B show that most reads of CRAC+ mRNA were mapped to specific location – close the pA sites.

      (2) Fig. 2C shows that most reads of CRAC+ mRNA were mapped to specific RNA motif located near the 3’ ends of the mRNAs.

      (3) Most RiBi CRAC+ promoter contain Rap1 binding sites (p= 1.9x10-22), whiles the vast majority of RiBi non-CRAC+  promoters do not. (Fig. 3C).

      (4) Fig. 4A shows that RiBi CRAC+ mRNAs become destabilized due to Sfp1 deletion, whereas RiBi non-CRAC+ mRNAs do not. Fig. 4B shows similar results due to Sfp1 depletion.

      (5) Fig. 6B shows that the impact of Sfp1 on backtracking is substantially higher for CRAC+ than for non-CRAC+ genes. This is most clearly visible in RiBi genes.

      (6) Fig. 7A shows that the Sfp1-dependent changes along the transcription units is substantially more rigorous for CRAC+ than for non-CRAC+.

      (7) In Fig. S4B, the chromatin binding profile of Sfp1 is shown to be different for CRAC+ and non-CRAC+ genes.

      Taken together, the many unique features, in fact, any feature that we examined, indicate the specificity and significance of this group, demonstrating that our CRAC results are biologically significant.

      Most importantly, these genes do not all fall into the category of highly transcribed genes.  On the contrary, as depicted in Figure 6A (green dots), it is evident that CRAC+ genes exhibit a diverse range of Rpb3 ChIP and GRO signals. Furthermore, as illustrated in Figure 7A, when comparing CRAC+ to Q1 (the most highly transcribed genes), it becomes evident that the Rpb4/Rpb3 profile of CRAC+ genes behaves differently from the Q1 group. Evidently, despite the heterogeneous transcription of CRAC+ genes (as mentioned above), the Rpb4/Rpb3 profile decreases more substantially than that of the highly transcribed genes (Q1).  Moreover, despite similar expression levels among all RiBi mRNAs, only a portion of them binds Sfp1.

      Thus, all our results indicate that CRAC+ genes represent biologically significant group, irrespective of the expression of it members. In response to this comment, we included a new paragraph discussing the validity of our conclusions. See page 18, blue paragraph.

      (5) To address the important question of whether co-transcriptional assembly of Spf1 with transcripts could alter their stability, the authors first used a reporter system in which the RPL30 transcription unit is transferred to vectors under different transcriptional contexts, as previously described by the Choder laboratory (Bregman et al. 2011). While RPL30 expressed under an ACT1 promoter was barely detectable, the highest levels of RNA were observed in the context of the native upstream RPL30 sequence when Rap1 binding sites were also present. Sfp1 showed better association with reporter mRNAs containing Rap1 binding sites in the promoter region. However, removal of the Rap1 binding sites from the reporter vector also led to a drastic decrease in reporter mRNA levels. Whether the fraction of co-purified RNA is nuclear and co-transcriptional or not cannot be inferred from these results. 

      The proposed co-transcriptional binding of Sfp1 is based on the findings presented in Figure 5C and Figure S2D, as well as the observed binding of Sfp1 to transcripts containing introns, as shown in Figures 2D and 3B.  The results of Fig. 3 led us to the assertion that the "RNA-binding capacity of Sfp1 is regulated by Rap1-binding sites located at the promoter." We maintain our stance on this conclusion. Indeed, the Rap1 binding site does impact mRNA levels, as highlighted by Reviewer 2. However, "construct E," which possesses a promoter with a Rap1 binding site, exhibits lower transcript levels compared to "construct F," which lacks such a binding site in its promoter. Despite this difference in transcript levels, Sfp1 was able to pull down the former transcript but not the latter, even though expression of the former gene is relatively low. Thus, the results appear to be more reliant on the specific capacity of Sfp1 to interact with the transcript rather than on the transcript's expression level.

      (6) To complement the biochemical data presented in the first part of the manuscript, the authors turned to the deletion or rapid depletion of SFP1 and used labelling experiments to assess changes in the rate of synthesis, abundance, and decay of mRNAs under these conditions. An important observation was that in the absence of Sfp1, mRNAs encoding ribosomal protein genes not only had a reduced synthesis rate but also an increased degradation rate. This important observation needs careful validation, as genomic run-on experiments were used to measure half-lives, and this particular method was found to give results that correlated poorly with other measures of half-life in yeast (e.g. Chappelboim et al., 2022 for a comparison). Similarly, the use of thiolutin to block transcription as a method of assessing mRNA half-life has been reported to be problematic, as thiolutin can specifically inhibit the degradation of ribosomal protein mRNA (Pelechano & Perez-Ortin, 2008). Specific repressible reporters, such as those used by Baudrimont et al. (2017), would need to be tested to validate the effect of Sfp1 on the half-life of specific mRNAs. Also, it would be very difficult to infer from the images presented whether the rate of deadenylation is altered by Sfp1.

      Various methods exist for assessing mRNA half-lives (HLs), and each of them carries its own set of challenges and biases. Consequently, it becomes problematic to directly compare HL values of a specific mRNA when different methods are employed. The superiority of one particular method over others remains unclear (in my opinion). However, they exhibit a high degree of reliability when it comes to comparing different strains under the identical conditions using a single method.

      Estimating HLs through the GRO approach is a non-invasive method, applied on optimally proliferating cells, which has been employed in numerous publications. While no method is without its limitations, our experience along the years reassured approach to be among the most dependable. Our HL determination using thiolutin to block transcription provided results that were consistent with the values obtained by the GRO approach.

      Nevertheless, in our revised manuscript, we supplemented the HL data, obtain by thiolutin, with results obtained by subjecting cells to a temperature shift to 42°C, a natural method to block transcription in wild-type (WT) cells. This approach to determine HLs has been previously reported in our publications, such as Lotan et al. (2005, 2007) and Goler Baron et al. (2008). The new results are shown in Fig. S3B. They are consistent with our conclusion that Sfp1 stabilizes mRNAs.

      Using a repressible promoter to determine mRNA HL is, unfortunately, not suitable in this paper because the promoter itself is involved in HL regulation. This observation is supported by Bregman et al. (2011) and depicted in Fig. 3, which illustrates that the promoter is critical for mRNA imprinting, consequently regulating HL.

      (7) The effects of SFP1 on transcription were investigated by chromatin purification with Rpb3, a subunit of RNA polymerase, and the results were compared with synthesis rates determined by genomic run-on experiments. The decrease in polII presence on transcripts in the absence of SFP1 was not accompanied by a marked decrease in transcript output, suggesting an effect of Sfp1 in ensuring robust transcription and avoiding RNA polymerase backtracking. To further investigate the phenotypes associated with the depletion or absence of Sfp1, the authors examined the presence of Rpb4 along transcription units compared to Rpb3. One effect of spf1 deficiency was that this ratio, which decreased from the start of transcription towards the end of transcripts, increased slightly. The results presented are largely correlative and could arise from the focus on very specific types of mRNAs, such as those of ribosomal protein genes, which are sensitive to stress and are targeted by very active RNA degradation mechanisms activated, for example, under heat stress (Bresson et al., 2020). 

      Figure 7A illustrates a significant reduction in Rpb4/Rpb3 ratios along the transcription unit in WT cells. This reduction is notably more pronounced in CRAC+ genes compared to the highly transcribed quartile (Q1), which includes all ribosomal protein (RP) genes, and it is completely absent in sfp1∆ cells. Furthermore, it's important to highlight that the CRAC+ gene group displays a wide range of transcription rates, as measured by either Rpb3 ChIP or GRO (Figure 6A). Given these observations, we do not think that heightened sensitivity of RP mRNA degradation in response to stress is responsible for the pronounced difference in the configuration of the Pol II elongation complex that is detected in CRAC+ genes, mainly because this experiment was performed under standard (non-stress) culture conditions.

      Correlative studies are particularly informative when a gene mutation eliminates a correlation, and this is precisely the type of study depicted in Figure 7B-C. The correlations shown in these panels are dependent on Sfp1. Indeed, RP genes are sensitive to stress. However, we used non-stressed conditions. Furthermore, CRAC+ genes did not display any apparent unusual destabilization but rather exhibited higher (not lower) mRNA stability compared to non-CRAC+ genes (Figure 7C).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The paper combines phenotypic and genomic analyses of the "sheltered load" (i.e. the accumulation of deleterious mutations linked to S-loci that are hidden from selection in the homozygous state) in Arabidopsis. The authors compare results to previous theoretical predictions concerning the extent of the load in dominant vs recessive S-alleles, and further develop exciting theory to reconcile differences between previous theory and observed results.

      Strengths:

      This is a very nice combination of theory and data to address a classical question in the field.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The "genetic load" is a poorly defined concept in general, and its quantification via the number of putatively deleterious mutations is quite difficult. Furthermore counting up the number of derived mutations at fully constrained nucleotides may not be a great estimate of the load, and certainly does not allow for evaluation of recessivity -- a concept critical to ideas concerning the sheltered load. Alternative approaches - including estimating the severity of mutations - could be helpful as well. This imperfection in available approaches to test theory must be acknowledged more strongly by the authors.

      As suggested by the reviewer, we implemented alternative approaches to estimate the severity of deleterious mutations and now report the results of SNPeff and

      SIFT4G analyses in Table S6. The results we obtained with these other metrics were overall very similar to those based on our previous counting of mutations at 0-fold and 4-fold degenerate sites. More generally, we tried to improve the presentation of our strategy to estimate the genetic load (clarified in lines 262-268, 271, 292-295, 297. In particular, we made it clear that our population genetic analysis cannot assess the recessivity of the observed mutations (lines 428-434).

      Reviewer #2 (Public Review):

      Summary:

      This study looks into the complex dominance patterns of S-allele incompatibilities in Brassicaceae, through which it attempts to learn more about the sheltering of deleterious load. I found several weak points in the analyses that diminished my excitement about the results. In particular, the way in which deleterious mutations were classified lacked the ability to distinguish the severity of the mutations and thus their expected associated dominance.

      First, we would like to clarify that our goal with this study is NOT to learn something about dominance of the linked deleterious mutations (we can not). Instead, we compare the accumulation of deleterious mutations linked to dominant vs recessive S-ALLELES, but are agnostic regarding the dominance level of the LINKED mutations themselves. The rationale is that the different intensities of natural selection between dominant vs recessive S-alleles provide a powerful way to examine the process by which deleterious mutations are sheltered in general. We further clarified this aspect on lines 70-73 and 399-401.

      Second, as mentioned above in response to Reviewer 1, we complemented the analysis by predicting the severity of the deleterious mutations by SIFT4G and SNPeff. The results were largely consistent, with the exception that the number of sites included in SIFT4G was low, such that the statistical power was reduced (lines 296-300).

      Furthermore, the simulation approach could have provided this exact sort of insight but was not designed to do so, making this comparison to the empirical data also less than exciting for me.

      As explained above, studying dominance of the linked mutations we observed is an interesting research question (albeit a difficult one), but it was not our goal here. Instead, our study was designed as an empirical test of the predictions presented in Llaurens et al (2009), and we re-analysed some aspects of the model outcome to illustrate our points.

      We now better explain that we based our choice of parameters on the fact that in the theoretical study by Llaurens et al (2009), recessive deleterious mutations are predicted to accumulate in a much more straightforward manner (line 316-318).

      We now dedicate a paragraph of the discussion to explain how our stochastic simulations could be improved, and acknowledge that a full exploration of the interaction between dominance of the S-alleles and dominance of the linked deleterious mutations would be an interesting follow-up - albeit beyond the scope of our study (line 437-441).

      Major and minor comments:

      I think the introduction (or somewhere before we dive into it in the results) of the dominance hierarchy for the S-alleles needs a more in-depth explanation. Not being familiar with this beforehand really made this paper inaccessible to me until I then went to find out more before continuing. I would expect this paper to be broad enough that self-contained information makes it accessible to all readers. For example, lines 110-115 could be in the Introduction.

      We thank the reviewer for this useful remark. We now give a more comprehensive description of the dominance hierarchy and introduce the classes of dominance in A. lyrata already in the introduction, on lines 64-70.

      Along with my above comment, perhaps it is not my place to comment, but I find the paper not of a broad enough scope to be of interest to a broad readership. This S-allele dominance system is more than simple balancing selection, it is a very complex and specific form of dominance between several haplotypes, and the mechanism of dominance does not seem to be genetic. I am not sure that it thus extrapolates to broad comments on general dominance and balancing selection, e.g. it would not be the same as considering inversions and this form of balancing selection where we also expect recessive deleterious mutations to accumulate.

      We disagree with these interpretations by the reviewer, for two reasons:

      First, the mechanism of dominance is actually entirely genetic. In fact, we uncovered some years ago that it is based on the molecular interaction between small non-coding RNAs from dominant alleles and their target sites on recessive alleles (Durand et al. Science 2014, see lines 68-70). If there is something specific with this system, it is that the dominance phenomenon is better understood at the mechanistic level than in most other cases, but the resulting phenomenon in itself (a dominance hierarchy) is rather common.

      Second, the kind of variation in the intensity of linked selection created by this mechanism is actually a general phenomenon, so our results have broad relevance beyond our particular study system. We modified the introduction to explain this point

      more clearly, highlighting in particular the fact that the situation we study closely resembles the case of sex chromosomes, where X (or Z) chromosomes are genetically recessive and Y (or W) chromosomes are genetically dominant. We cite this example in lines 83-87 of the introduction and also several well-studied other examples on lines 480-489 of the discussion.

      It would have been particularly interesting, or a nice addition, to see deleterious mutations classed by something like SNPeff or GERP where you can have different classes of moderate to severe deleterious variants, which we would expect also to be more recessive the more deleterious they are. In line with my next comment on the simulations, I think relative differences between mutations expected to be more or less dominant may be even more insightful into the process of sheltering which may or may not be going on here.

      We agree with the reviewer, and as detailed above we have now integrated such analyses with SNPeff and SIFT4G (Table S6). These new results reinforce our conclusion that while S-allele dominance influences the fixation of deleterious mutations, it has no effect on their total number. See lines 270-272 and 296-300.

      In the simulations, h=0 and s=0.01 (as in Figure 5) for all deleterious mutations seems overly simplistic, and at the convenient end for realistic dominance. I think besides recessive lethals which we expect to be close to h=0 would have a much larger selection coefficient, and other deleterious mutations would only be partially recessive at such an s value. I expect this would change some of the simulation results seen, though to what degree I am not certain. It would be nice to at least check the same exact results for h=0.3 or 0.2 (or additionally also for recessive lethals, e.g. h=0 and s=-0.9). I would also disagree with the statement in line 677, many studies have shown, particularly those on balancing selection, that partially recessive deleterious mutations are not eliminated by natural selection and do play a role in population genetic dynamics. I am also not surprised that extinction was found for higher s values when the mutation rate for such mutations was very high and the distribution of s values was constant. An influx of such highly deleterious mutations is unlikely to ever let a population survive, yet that does NOT mean that in nature, the rare influx of such mutations does lead to them being sheltered. I find overall that the simulation results contribute very little, to none, to this paper, as without something more realistic, like a simultaneous distribution of s and h values, you cannot say which, if any class of these mutations are the ones expected to accumulate because of S-allele dominance.

      We understand that the previous version of our manuscript was confusing between dominance of the S-alleles and dominance of the linked deleterious mutations. We clarified that our study focuses on the effect of the former only (lines 99, 263-264 and 581-583).

      We agree that a complete exploration of the interaction between dominance of the S-alleles and dominance of the linked mutations being sheltered would have been an asset, but as explained above this is not the focus of our study. The previous work by Llaurens et al (2009) has already established that deleterious mutations can fix within S-allele lineages, especially when linked to dominant S-alleles, and when the number of S-alleles is large. Under the conditions they examined, deleterious mutations were much more strongly eliminated if not fully recessive (h=0 vs h=0.2), so for the present study we decided to simulate fully recessive mutations only. We now formally acknowledge the possibility that some complex interaction may take place between dominance of the S-alleles and dominance of the linked deleterious mutations (lines 440-442). However, as explained above we feel that fully exploring this complex interaction would require a detailed investigation, which is clearly beyond the scope of the present study.

      Rather they only show the disappointing or less exciting result that fully recessive, weakly deleterious mutations (which I again think do not even exist in nature as I said above) have minor, to no effect across the classes of S-allele dominance. They provide no insight into whether any type of recessive deleterious mutation can accumulate under the S-allele dominance hierarchy, and that is the interesting question at hand. I would either remove these simulations or redo them in another approach. The authors never mention what simulation approach was used, so I can only assume this is custom, in-house code. Yet I do not find that code provided on the github page. I do not know if the lack of a distribution for h and s values is then a choice or a programming limitation, but I see it as one that should be overcome if these simulations are meant to be meaningful to the results of the study.

      The code we used (in C) was adapted from the previous study by Llaurens et al. (2009), which at the time was not deposited in a data repertory, unfortunately. With the agreement of the authors of that study, this code is now available on Github:

      (https://github.com/leveveaudrey/model_ssi_Llaurens; line 723).

      It is correct that our simulations were not aimed at determining whether “any type of recessive deleterious mutation can accumulate”, but we strongly believe that they help interpreting the observations made in the genomic data.

      Recommendations for the authors:

      Notes from the editor:

      I found Table 1 confusing, with column headings of observed proportion but perhaps numbers reflecting counts.

      Thank you for pointing out this confusion. There was indeed an error in the last column, which we have now corrected.

      I found Figure 2 a bit hard to parse, with the vertical lines being unclear and the x-axis ticks of insufficient resolution to evaluate the physical extent of the signals.

      We increased the size of the label on the x-axis and detailed it on the Figure 2, which is now hopefully more clear. Moreover, we increase the size of the vertical lines.

      Finally, I wonder, given the rapid decay of signal in lyrata, whether 25kb is the right choice for evaluating load and whether the pattern may look different on a smaller scale.

      It is true that the signal decays rapidly in A. lyrata, as can be seen in the haplotype structure analysis and in line with our previous analysis of the same populations Le Veve et al (MBE 2023; in this study we explored the effect of the choice of the size of the chromosomal region analyzed; lines 266-269). However, for the sake of comparison, we prefer to stick to the same window size. The fact that we still see an effect of dominance in spite of the lower statistical power associated with the more rapid decay (because a smaller number of genes is expected to be impacted) actually reinforces our conclusions.

      Reviewer #1 (Recommendations For The Authors):

      I have a few additional suggestions to improve the manuscript.

      (1) How does the load linked to the S-locus compare to that observed in other genomic regions? It would be useful to provide a comparison of the results quantified in Figures three and four to comparable genomic regions unlinked to the S-locus. How severe is the linked load?

      This comparison to the genomic background was actually the core of our previous study (Le Veve et al MBE 2023), which was based on the same populations. This analysis revealed that polymorphism of the 0-fold degenerate sites was more than twice higher in the 25kb immediately flanking the S-locus than in a series of 100 unlinked control regions. Here, the main focus of the present study is on the effect of linkage to particular S-alleles (which was not possible previously because haplotypes had to be phased).

      (2) Details of the GLM for data underlying Figures 3 and 4 are somewhat unclear. Is the key explanatory variable (Dominance) treated as continuous? Categorical? Ordinal etc…

      Dominance is considered as a continuous variable. We specify this in line 162 of the results, in the legends of Figures 3 and 4, in the Material and Method (lines 627 and 660) and in the legend of Table S4.

      (3) I had some trouble understanding the two different p-values in columns five and six of table one. Please provide more detail.

      We understand that the two p-values in Table 1 were confusing. The first was related to the binomial test and the second to the permutation test. To be consistent with the rest of the manuscript, we conserved only the p-value of the permutation test.

      (4) As mentioned in the "weaknesses" above, the authors should be more clear about what they are quantifying. They are explicitly counting the number of variants at 0-fold degenerate sites as a proxy for the genetic load. How good this proxy is is unclear. The most egregious misstatement here was on line 314 in which they make reference to the "total load." However, this limitation should be acknowledged throughout the manuscript and deserves more attention in the methods and discussion.

      As mentioned above, we now integrate additional methods to define and quantify the load (SIFT4G and SNPeff), which reinforced our previous conclusions (lines 271-272, 297-302).

      We clarified our wording and replaced the mention of “total load” by “mean number of linked deleterious mutations per copy of S-allele” (line 324-325). In the discussion we tried to better explain the limitations of approaches to estimate the genetic load (line 431-437).

      Reviewer #2 (Recommendations For The Authors):

      Line 60, it should be specified that this is only for recessive deleterious mutations.

      Non-recessive deleterious mutations would certainly not be expected to accumulate.

      As explained in details above, the question of whether and how non-recessive deleterious mutations can accumulate when linked to the S-locus is difficult and would in itself deserve a full treatment, which is clearly beyond the scope of the present study. We clarified this point on line 56.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major comments (Public Reviews)

      Generality of grid cells

      We appreciate the reviewers’ concern regarding the generality of our approach, and in particular for analogies in nonlinear spaces. In that regard, there are at least two potential directions that could be pursued. One is to directly encode nonlinear structures (such as trees, rings, etc.) with grid cells, to which DPP-A could be applied as described in our model. The TEM model [1] suggests that grid cells in the medial entorhinal may form a basis set that captures structural knowledge for such nonlinear spaces, such as social hierarchies and transitive inference when formalized as a connected graph. Another would be to use eigen-decomposition of the successor representation [2], a learnable predictive representation of possible future states that has been shown by Stachenfield et al. [3] to provide an abstract structured representation of a space that is analogous to the grid cell code. This general-purpose mechanism could be applied to represent analogies in nonlinear spaces [4], for which there may not be a clear factorization in terms of grid cells (i.e., distinct frequencies and multiple phases within each frequency). Since the DPP-A mechanism, as we have described it, requires representations to be factored in this way it would need to be modified for such purpose. Either of these approaches, if successful, would allow our model to be extended to domains containing nonlinear forms of structure. To the extent that different coding schemes (i.e., basis sets) are needed for different forms of structure, the question of how these are identified and engaged for use in a given setting is clearly an important one, that is not addressed by the current work. We imagine that this is likely subserved by monitoring and selection mechanisms proposed to underlie the capacity for selective attention and cognitive control [5], though the specific computational mechanisms that underlie this function remain an important direction for future research. We have added a discussion of these issues in Section 6 of the updated manuscript.

      (1) Whittington, J.C., Muller, T.H., Mark, S., Chen, G., Barry, C., Burgess, N. and Behrens, T.E., 2020. The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell, 183(5), pp.1249-1263.

      (2) Dayan, P., 1993. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4), pp.613-624.

      (3) Stachenfeld, K.L., Botvinick, M.M. and Gershman, S.J., 2017. The hippocampus as a predictive map. Nature neuroscience, 20(11), pp.1643-1653.

      (4) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      (5) Shenhav, A., Botvinick, M.M. and Cohen, J.D., 2013. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 79(2), pp.217-240. Biological plausibility of DPP-A

      We appreciate the reviewers’ interest in the biological plausibility of our model, and in particular the question of whether and how DPP-A might be implemented in a neural network. In that regard, Bozkurt et al. [1] recently proposed a biologically plausible neural network algorithm using a weighted similarity matrix approach to implement a determinant maximization criterion, which is the core idea underlying the objective function we use for DPP-A, suggesting that the DPP-A mechanism we describe may also be biologically plausible. This could be tested experimentally by exposing individuals (e.g., rodents or humans) to a task that requires consistent exposure to a subregion, and evaluating the distribution of activity over the grid cells. Our model predicts that high frequency grid cells should increase their firing rate more than low frequency cells, since the high frequency grid cells maximize the determinant of the covariance matrix of the grid cell embeddings. It is also worth noting that Frankland et al. [2] have suggested that the use of DPPs may also help explain a mutual exclusivity bias observed in human word learning and reasoning. While this is not direct evidence of biological plausibility, it is consistent with the idea that the human brain selects representations for processing that maximize the volume of the representational space, which can be achieved by maximizing the DPP-A objective function defined in Equation 6. We have added a comment to this effect in Section 6 of the updated manuscript.

      (1) Bozkurt, B., Pehlevan, C. and Erdogan, A., 2022. Biologically-plausible determinant maximization neural networks for blind separation of correlated sources. Advances in Neural Information Processing Systems, 35, pp.13704-13717.

      (2) Frankland, S. and Cohen, J., 2020. Determinantal Point Processes for Memory and Structured Inference. In CogSci.

      Simplicity of analogical problem and comparison to other models using this task

      First, we would like to point out that analogical reasoning is a signatory feature of human cognition, which supports flexible and efficient adaptation to novel inputs that remains a challenge for most current neural network architectures. While humans can exhibit complex and sophisticated forms of analogical reasoning [1, 2, 3], here we focused on a relatively simple form, that was inspired by Rumelhart’s parallelogram model of analogy [4,5] that has been used to explain traditional human verbal analogies (e.g., “king is to what as man is to woman?”). Our model, like that one, seeks to explain analogical reasoning in terms of the computation of simple Euclidean distances (i.e., A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript. It is worth noting that, despite the seeming simplicity of this construction, we show that standard neural network architectures (e.g., LSTMs and transformers) struggle to generalize on such tasks without the use of the DPP-A mechanism.

      Second, we are not aware of any previous work other than Frankland et al. [6] cited in the first paragraph of Section 2.2.1, that has examined the capacity of neural network architectures to perform even this simple form of analogy. The models in that study were hardcoded to perform analogical reasoning, whereas we trained models to learn to perform analogies. That said, clearly a useful line of future work would be to scale our model further to deal with more complex forms of representation and analogical reasoning tasks [1,2,3]. We have noted this in Section 6 of the updated manuscript.

      (1) Holyoak, K.J., 2012. Analogy and relational reasoning. The Oxford handbook of thinking and reasoning, pp.234-259.

      (2) Webb, T., Fu, S., Bihl, T., Holyoak, K.J. and Lu, H., 2023. Zero-shot visual reasoning through probabilistic analogical mapping. Nature Communications, 14(1), p.5144.

      (3) Lu, H., Ichien, N. and Holyoak, K.J., 2022. Probabilistic analogical mapping with semantic relation networks. Psychological review.

      (4) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (5) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (6) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      Clarification of DPP-A attentional modulation

      We would like to clarify several concerns regarding the DPP-A attentional modulation. First, we would like to make it clear that ω is not meant to correspond to synaptic weights, and thank the reviewer for noting the possibility for confusion on this point. It is also distinct from a biasing input, which is often added to the product of the input features and weights. Rather, in our model ω is a vector, and diag (ω) converts it into a matrix with ω as the diagonal of the matrix, and the rest entries are zero. In Equation 6, diag(ω) is matrix multiplied with the covariance matrix V, which results in elementwise multiplication of ω with column vectors of V, and hence acts more like gates. We have noted this in Section 2.2.2 and have changed all instances of “weights (ω)” to “gates (ɡ)” in the updated manuscript. We have also rewritten the definition of Equation 6 and uses of it (as in Algorithm 1) to depict the use of sigmoid nonlinearity (σ) to , so that the resulting values are always between 0 and 1.

      Second, we would like to clarify that we don’t compute the inner product between the gates ɡ and the grid cell embeddings x anywhere in our model. The gates within each frequency were optimized (independent of the task inputs), according to Equation 6, to compute the approximate maximum log determinant of the covariance matrix over the grid cell embeddings individually for each frequency. We then used the grid cell embeddings belonging to the frequency that had the maximum within-frequency log determinant for training the inference module, which always happened to be grid cells within the top three frequencies. Author response image 1 (also added to the Appendix, Section 7.10 of the updated manuscript) shows the approximate maximum log determinant (on the y-axis) for the different frequencies (on the x-axis).

      Author response image 1.

      Approximate maximum log determinant of the covariance matrix over the grid cell embeddings (y-axis) for each frequency (x-axis), obtained after maximizing Equation 6.

      Third, we would like to clarify our interpretation of why DPP-A identified grid cell embeddings corresponding to the highest spatial frequencies, and why this produced the best OOD generalization (i.e., extrapolation on our analogy tasks). It is because those grid cell embeddings exhibited greater variance over the training data than the lower frequency embeddings, while at the same time the correlations among those grid cell embeddings were lower than the correlations among the lower frequency grid cell embeddings. The determinant of the covariance matrix of the grid cell embeddings is maximized when the variances of the grid cell embeddings are high (they are “expressive”) and the correlation among the grid cell embeddings is low (they “cover the representational space”). As a result, the higher frequency grid cell embeddings more efficiently covered the representational space of the training data, allowing them to efficiently capture the same relational structure across training and test distributions which is required for OOD generalization. We have added some clarification to the second paragraph of Section 2.2.2 in the updated manuscript. Furthermore, to illustrate this graphically, Author response image 2 (added to the Appendix, Section 7.10 of the updated manuscript) shows the results after the summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for 3 representative frequencies (left, middle and right panels showing results for the lowest, middle and highest grid cell frequencies, respectively, of the 9 used in the model), obtained after maximizing Equation 6 for each grid cell frequency. The color code indicates the responsiveness of the grid cells to different X and Y locations in the input space (lighter color corresponding to greater responsiveness). Note that the dark blue area (denoting regions of least responsiveness to any grid cell) is greatest for the lowest frequency and nearly zero for the highest frequency, illustrating that grid cell embeddings belonging to the highest frequency more efficiently cover the representational space which allows them to capture the same relational structure across training and test distributions as required for OOD generalization.

      Author response image 2.

      Each panel shows the results after summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for a particular frequency, obtained after maximizing Equation 6 for each grid cell frequency. The left, middle, and right panels show results for the lowest, middle, and highest grid cell frequencies, respectively, of the 9 used in the model. Lighter color in each panel corresponds to greater responsiveness of grid cells at that particular location in the 2d space.

      Finally, we would like to clarify how the DPP-A attentional mechanism is different from the attentional mechanism in the transformer module, and why both are needed for strong OOD generalization. Use of the standard self-attention mechanism in transformers over the inputs (i.e., A, B, C, and D for the analogy task) in place of DPP-A would lead to weightings of grid cell embeddings over all frequencies and phases. The objective function for the DPP-A represents an inductive bias, that selectively assigns the greatest weight to all grid cell embeddings (i.e., for all phases) of the frequency for which the determinant of the covariance matrix is greatest computed over the training space. The transformer inference module then attends over the inputs with the selected grid cell embeddings based on the DPP-A objective. We have added a discussion of this point in Section 6 of the updated manuscript.

      We would like to thank the reviewers for their recommendations. We have tried our best to incorporate them into our updated manuscript. Below we provide a detailed response to each of the recommendations grouped for each reviewer.

      Reviewer #1 (Recommendations for the authors)

      (1) It would be helpful to see some equations for R in the main text.

      We thank the reviewer for this suggestion. We have now added some equations explaining the working of R in Section 2.2.3 of the updated manuscript.

      (2) Typo: p 11 'alongwith' -> 'along with'

      We have changed all instances of ‘alongwith’ to ‘along with’ in the updated manuscript.

      (3) Presumably, this is related to equivariant ML - it would be helpful to comment on this.

      Yes, this is related to equivariant ML, since the properties of equivariance hold for our model. Specifically, the probability distribution after applying softmax remains the same when the transformation (translation or scaling) is applied to the scores for each of the answer choices obtained from the output of the inference module, and when the same transformation is applied to the stimuli for the task and all the answer choices before presenting as input to the inference module to obtain the scores. We have commented on this in Section 2.2.3 of the updated manuscript.

      Reviewer #2 (Recommendations for the authors)

      (1) Page 2 - "Webb et al." temporal context - they should also cite and compare this to work by Marc Howard on generalization based on multi-scale temporal context.

      While we appreciate the important contributions that have been made by Marc Howard and his colleagues to temporal coding and its role in episodic memory and hippocampal function, we would like to clarify that his temporal context model is unrelated to the temporal context normalization developed by Webb et al. (2020) and mentioned on Page 2. The former (Temporal Context Model) is a computational model that proposes a role for temporal coding in the functions of the medial temporal lobe in support of episodic recall, and spatial navigation. The latter (temporal context normalization) is a normalization procedure proposed for use in training a neural network, similar to batch normalization [1], in which tensor normalization is applied over the temporal instead of the batch dimension, which is shown to help with OOD generalization. We apologize for any confusion engendered by the similarity of these terms, and failure to clarify the difference between these, that we have now attempted to do in a footnote on Page 2.

      Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

      (2) page 3 - "known to be implemented in entorhinal" - It's odd that they seem to avoid citing the actual biology papers on grid cells. They should cite more of the grid cell recording papers when they mention the entorhinal cortex (i.e. Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012; Giocomo et al., 2011; Brandon et al., 2011).

      We have now cited the references mentioned below, on page 3 after the phrase “known to be implemented in entohinal cortex”.

      (1) Barry, C., Hayman, R., Burgess, N. and Jeffery, K.J., 2007. Experience-dependent rescaling of entorhinal grids. Nature neuroscience, 10(6), pp.682-684.

      (2) Stensola, H., Stensola, T., Solstad, T., Frøland, K., Moser, M.B. and Moser, E.I., 2012. The entorhinal grid map is discretized. Nature, 492(7427), pp.72-78.

      (3) Giocomo, L.M., Hussaini, S.A., Zheng, F., Kandel, E.R., Moser, M.B. and Moser, E.I., 2011. Grid cells use HCN1 channels for spatial scaling. Cell, 147(5), pp.1159-1170.

      (4) Brandon, M.P., Bogaard, A.R., Libby, C.P., Connerney, M.A., Gupta, K. and Hasselmo, M.E., 2011. Reduction of theta rhythm dissociates grid cell spatial periodicity from directional tuning. Science, 332(6029), pp.595-599.

      (3) To enhance the connection to biological systems, they should cite more of the experimental and modeling work on grid cell coding (for example on page 2 where they mention relational coding by grid cells). Currently, they tend to cite studies of grid cell relational representations that are very indirect in their relationship to grid cell recordings (i.e. indirect fMRI measures by Constaninescu et al., 2016 or the very abstract models by Whittington et al., 2020). They should cite more papers on actual neurophysiological recordings of grid cells that suggest relational/metric representations, and they should cite more of the previous modeling papers that have addressed relational representations. This could include work on using grid cell relational coding to guide spatial behavior (e.g. Erdem and Hasselmo, 2014; Bush, Barry, Manson, Burges, 2015). This could also include other papers on the grid cell code beyond the paper by Wei et al., 2015 - they could also cite work on the efficiency of coding by Sreenivasan and Fiete and by Mathis, Herz, and Stemmler.

      We thank the reviewer for bringing the additional references to our attention. We have cited the references mentioned below on page 2 of the updated manuscript.

      (1) Erdem, U.M. and Hasselmo, M.E., 2014. A biologically inspired hierarchical goal directed navigation model. Journal of Physiology-Paris, 108(1), pp.28-37.

      (2) Sreenivasan, S. and Fiete, I., 2011. Grid cells generate an analog error-correcting code for singularly precise neural computation. Nature neuroscience, 14(10), pp.1330-1337.

      (3) Mathis, A., Herz, A.V. and Stemmler, M., 2012. Optimal population codes for space: grid cells outperform place cells. Neural computation, 24(9), pp.2280-2317.

      (4) Bush, D., Barry, C., Manson, D. and Burgess, N., 2015. Using grid cells for navigation. Neuron, 87(3), pp.507-520

      (4) Page 3 - "Determinantal Point Processes (DPPs)" - it is rather annoying that DPP is defined after DPP-A is defined. There ought to be a spot where the definition of DPP-A is clearly stated in a single location.

      We agree it makes more sense to define Determinantal Point Process (DPP) before DPP-A. We have now rephrased the sentences accordingly. In the “Abstract”, the sentence now reads “Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), which we call DPP attention (DPP-A) - a transformation that ensures maximum sparseness in the coverage of that space.” We have also modified the second paragraph of the “Introduction”. The modified portion now reads “b) an attentional objective inspired from Determinantal Point Processes (DPPs), which are probabilistic models of repulsion arising in quantum physics [1], to attend to abstract representations that have maximum variance and minimum correlation among them, over the training data. We refer to this as DPP attention or DPP-A.” Due to this change, we removed the last sentence of the fifth paragraph of the “Introduction”.

      (1) Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

      (5) Page 3 - "the inference module R" - there should be some discussion about how this component using LSTM or transformers could relate to the function of actual brain regions interacting with entorhinal cortex. Or if there is no biological connection, they should state that this is not seen as a biological model and that only the grid cell code is considered biological.

      While we agree that the model is not construed to be as specific about the implementation of the R module, we assume that — as a standard deep learning component — it is likely to map onto neocortical structures that interact with the entorhinal cortex and, in particular, regions of the prefrontal-posterior parietal network widely believed to be involved in abstract relational processes [1,2,3,4]. In particular, the role of the prefrontal cortex in the encoding and active maintenance of abstract information needed for task performance (such as rules and relations) has often been modeled using gated recurrent networks, such as LSTMs [5,6], and the posterior parietal cortex has long been known to support “maps” that may provide an important substrate for computing complex relations [4]. We have added some discussion about this in Section 2.2.3 of the updated manuscript.

      (1) Waltz, J.A., Knowlton, B.J., Holyoak, K.J., Boone, K.B., Mishkin, F.S., de Menezes Santos, M., Thomas, C.R. and Miller, B.L., 1999. A system for relational reasoning in human prefrontal cortex. Psychological science, 10(2), pp.119-125.

      (2) Christoff, K., Prabhakaran, V., Dorfman, J., Zhao, Z., Kroger, J.K., Holyoak, K.J. and Gabrieli, J.D., 2001. Rostrolateral prefrontal cortex involvement in relational integration during reasoning. Neuroimage, 14(5), pp.1136-1149.

      (3) Knowlton, B.J., Morrison, R.G., Hummel, J.E. and Holyoak, K.J., 2012. A neurocomputational system for relational reasoning. Trends in cognitive sciences, 16(7), pp.373-381.

      (4) Summerfield, C., Luyckx, F. and Sheahan, H., 2020. Structure learning and the posterior parietal cortex. Progress in neurobiology, 184, p.101717.

      (5) Frank, M.J., Loughry, B. and O’Reilly, R.C., 2001. Interactions between frontal cortex and basal ganglia in working memory: a computational model. Cognitive, Affective, & Behavioral Neuroscience, 1, pp.137-160.

      (6) Braver, T.S. and Cohen, J.D., 2000. On the control of control: The role of dopamine in regulating prefrontal function and working memory. Control of cognitive processes: Attention and performance XVIII, (2000).

      (6) Page 4 - "Learned weighting w" - it is somewhat confusing to use "w" as that is commonly used for synaptic weights, whereas I understand this to be an attentional modulation vector with the same dimensionality as the grid cell code. It seems more similar to a neural network bias input than a weight matrix.

      We refer to the first paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (7) Page 4 - "parameterization of w... by two loss functions over the training set." - I realize that this has been stated here, but to emphasize the significance to a naïve reader, I think they should emphasize that the learning is entirely focused on the initial training space, and there is NO training done in the test spaces. It's very impressive that the parameterization is allowing generalization to translated or scaled spaces without requiring ANY training on the translated or scaled spaces.

      We have added the sentence “Note that learning of parameter occurs only over the training space and is not further modified during testing (i.e. over the test spaces)” to the updated manuscript.

      (8) Page 4 - "The first," - This should be specific - "The first loss function"

      We have changed it to “The first loss function” in the updated manuscript.

      (9) Page 4 - The analogy task seems rather simplistic when first presented (i.e. just a spatial translation to different parts of a space, which has already been shown to work in simulations of spatial behavior such as Erdem and Hasselmo, 2014 or Bush, Barry, Manson, Burgess, 2015). To make the connection to analogy, they might provide a brief mention of how this relates to the analogy space created by word2vec applied to traditional human verbal analogies (i.e. king-man+woman=queen).

      We agree that the analogy task is simple, and recognize that grid cells can be used to navigate to different parts of space over which the test analogies are defined when those are explicitly specified, as shown by Erdem and Hasselmo (2014) and Bush, Barry, Manson, and Burgess (2015). However, for the analogy task, the appropriate set of grid cell embeddings must be identified that capture the same relational structure between training and test analogies to demonstrate strong OOD generalization, and that is achieved by the attentional mechanism DPP-A. As suggested by the reviewer’s comment, our analogy task is inspired by Rumelhart’s parallelogram model of analogy [1,2] (and therefore similar to traditional human verbal analogies) in as much as it involves differences (i.e A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript.

      (1) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (2) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (10) Page 5 - The variable "KM" is a bit confusing when it first appears. It would be good to re-iterate that K and M are separate points and KM is the vector between these points.

      We apologize for the confusion on this point. KM is meant to refer to an integer value, obtained by multiplying K and M, which is added to both dimensions of A, B, C and D, which are points in ℤ2, to translate them to a different region of the space. K is an integer value ranging from 1 to 9 and M is also an integer value denoting the size of the training region, which in our implementation is 100. We have clarified this in Section 2.1.1 of the updated manuscript.

      (11) Page 5 - "two continuous dimensions (Constantinescu et al._)" - this ought to give credit to the original study showing the abstract six-fold rotational symmetry for spatial coding (Doeller, Barry and Burgess).

      We have now cited the original work by Doeller et al. [1] along with Constantinescu et al. (2016) in the updated manuscript after the phrase “two continuous dimensions” on page 5.

      (1) Doeller, C.F., Barry, C. and Burgess, N., 2010. Evidence for grid cells in a human memory network. Nature, 463(7281), pp.657-661.

      (12) Page 6 - Np=100. This is done later, but it would be clearer if they right away stated that Np*Nf=900 in this first presentation.

      We have now added this sentence after Np=100. “Hence Np*Nf=900, which denotes the number of grid cells.”

      (13) Page 6 - They provide theorem 2.1 on the determinant of the covariance matrix of the grid code, but they ought to cite this the first time this is mentioned.

      We have cited Gilenwater et al. (2012) before mentioning theorem 2.1. The sentence just before that reads “We use the following theorem from Gillenwater et al. (2012) to construct :”

      (14) Page 6 - It would greatly enhance the impact of the paper if they could give neuroscientists some sense of how the maximization of the determinant of the covariance matrix of the grid cell code could be implemented by a biological circuit. OR at least to show an example of the output of this algorithm when it is used as an inner product with the grid cell code. This would require plotting the grid cell code in the spatial domain rather than the 900 element vector.

      We refer to our response above to the topic “Biological plausibility of DPP-A” and second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contain our responses to this issue.

      (15) Page 6 - "That encode higher spatial frequencies..." This seems intuitive, but it would be nice to give a more intuitive description of how this is related to the determinant of the covariance matrix.

      We refer to the third paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (16) Page 7 - log of both sides... Nf is number of frequencies... Would be good to mention here that they are referring to equation 6 which is only mentioned later in the paragraph.

      As suggested, we now refer to Equation 6 in the updated manuscript. The sentence now reads “This is achieved by maximizing the determinant of the covariance matrix over the within frequency grid cell embeddings of the training data, and Equation 6 is obtained by applying the log on both sides of Theorem 2.1, and in our case where refers to grid cells of a particular frequency.”

      (17) Page 7 - Equation 6 - They should discuss how this is proposed to be implemented in brain circuits.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      18) Page 9 - "egeneralize" - presumably this is a typo?

      Yes. We have corrected it to “generalize” in the updated manuscript.

      (19) Page 9 - "biologically plausible encoding scheme" - This is valid for the grid cell code, but they should be clear that this is not valid for other parts of the model, or specify how other parts of the model such as DPP-A could be biologically plausible.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (20) Page 12 - Figure 7 - comparsion to one-hots or smoothed one-hots. The text should indicate whether the smoothed one-hots are similar to place cell coding. This is the most relevant comparison of coding for those knowledgeable about biological coding schemes.

      Yes, smoothed one-hots are similar to place cell coding. We now mention this in Section 5.3 of the updated manuscript.

      (21) Page 12 - They could compare to a broader range of potential biological coding schemes for the overall space. This could include using coding based on the boundary vector cell coding of the space, band cell coding (one dimensional input to grid cells), or egocentric boundary cell coding.

      We appreciate these useful suggestions, which we now mention as potentially valuable directions for future work in the second paragraph of Section 6 of the updated manuscript.

      (22) Page 13 - "transformers are particularly instructive" - They mention this as a useful comparison, but they might discuss further why a much better function is obtained when attention is applied to the system twice (once by DPP-A and then by a transformer in the inference module).

      We refer to the last paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (23) Page 13 - "Section 5.1 for analogy and Section 5.2 for arithmetic" - it would be clearer if they perhaps also mentioned the specific figures (Figure 4 and Figure 6) presenting the results for the transformer rather than the LSTM.

      We have now rephrased to also refer to the figures in the updated manuscript. The phrase now reads “a transformer (Figure 4 in Section 5.1 for analogy and Figure 6 in Section 5.2 for arithmetic tasks) failed to achieve the same level of OOD generalization as the network that used DPP-A.”

      (24) Page 14 - "statistics of the training data" - The most exciting feature of this paper is that learning during the training space analogies can so effectively generalize to other spaces based on the right attention DPP-A, but this is not really made intuitive. Again, they should illustrate the result of the xT w inner product to demonstrate why this work so effectively!

      We refer to the second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (25) Bibliography - Silver et al., go paper - journal name "nature" should be capitalized. There are other journal titles that should be capitalized. Also, I believe eLife lists family names first.

      We have made the changes to the bibliography of the updated manuscript suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      More details should be provided in terms of inclusion and exclusion criteria for the participants, as well as missing data due to the non-cooperation of newborns during the experimental process. Potential differences between preterm and full-term infants are worth exploring. Several aspects of EEG data analyses and data interpretation should be better clarified.

      Here I have several comments and questions to improve the manuscript.

      (1) It would be wise to know whether there was any missing data due to the non-cooperation of newborns during the experimental process.

      Thank you for the suggestion. While our initial aim was to include 120 neonates in the final data analysis, we actually recruited 198 neonatal participants for this study. The 78 EEG datasets were excluded from the data analysis due to non-cooperation of neonates (n = 75) or technical issues (n = 3). We have incorporated this detailed information in the Subjects subsection (lines 375-383) in the revised manuscript.

      (2) The authors investigated the impact of gestational age on emotional perceptual sensitivity in newborns by grouping infants of varying gestational ages in the experiment. The methods section mentions that the study conducted experiments within 24 hours after the birth of the newborns. When do preterm infants (with a gestational age of 35 and 36 weeks) begin to exhibit emotional discrimination comparable to full-term newborns? 

      This is indeed an intriguing question that merits exploration. However, in our study, we recruited relatively healthy preterm neonates, many of whom were discharged from the hospital with their mothers within 3-5 days after birth. It would have been challenging to arrange for another EEG testing session once these preterm infants reached full-term age, as their parents were unwilling to return to the hospital.

      (3) When analyzing EEG data, excluding artifacts with peak deviations exceeding ±200 μV is a relatively lenient criterion, potentially resulting in the retention of some large-amplitude artifacts or noise. What is the rationale behind the author's choice of this criterion? Or, in other words, what considerations led to this specific selection?

      In our standard practice, we typically employ a stricter threshold of ±100 μV for artifact removal in studies involving healthy adults and a median threshold of ±150 μV for data from adult patients, such as those with schizophrenia. However, when analyzing neonatal data, we often resort to the loosest criterion of ±200 μV. This decision is primarily due to the inherent challenges associated with neonatal EEG recordings, as we cannot expect newborns to cooperate or remain quiet during the recording process. Consequently, neonatal EEG data tend to contain more artifacts compared to those from healthy adults. Furthermore, the excitability of the newborn brain is notably elevated. This heightened excitability arises from an imbalance in the distribution and function of excitatory and inhibitory neurotransmitter systems. Typically, the expression of excitatory neurotransmitters and their receptors surpasses that of inhibitory neurotransmitters, resulting in increased excitability in the immature brain. This heightened excitability can occasionally lead to the occurrence of paroxysmal electrical activity. As a result, neonatal EEG recordings may at times display large amplitudes, exceeding even 100 μV. In this revision, we have referenced other neonatal/infant EEG studies or technique pipelines that have used the threshold of ±200 μV to support this criterion (lines 483-484).    

      (4) In the Discussion section, the authors mentioned the biomarkers, such as the fusiform gyrus and hippocampus, which have been identified as potential predictors of autism risk. It is suggested that the authors briefly elucidate the crucial role of these biomarkers in processing social information, which would enhance the readability and logicality of this manuscript.

      Thank you for the thoughtful suggestion. We have expanded the discussion concerning the involvement of the fusiform gyrus and hippocampus in social information processing (lines 314-319).

      Reviewer #2 (Public Review):

      First, readers need to see spectrograms that show the 0-4000 Hz in more detail, rather than what is now shown (0-10,000 Hz). The vocal signals in clearer spectrograms will show I believe the initial consonant burst and formant frequencies that are unique to human speech and give rise to the perception of the consonant sounds in the vocal signals like 'dada' and 'tutu' that were tested. The control signals will presumably not show these abrupt acoustic changes at their onset, even though they appear (from the oscillograms) to approximate the amplitude envelope. The primary cue distinguishing the happy and neutral signals in both the vocal and control signals is the pitch of the signals (high vs low), but the burst of energy representing the consonants is only contained in the vocal signals; it has no comparable match in the control signals. It is possible that the presence of a sharp acoustic onset (a unique characteristic of consonants in human speech) is especially alerting to the infants, and that this acoustic cue, in the context of the pitch change, enhances discrimination in the vocal case. One way to test this would be to use only vowel sounds to represent the vocal signals, without consonants.

      Thank you for your expert comments and considerations. We have redrawn Figure 3 using Praat software with a frequency range of 0-5000 Hz, as suggested by Praat’s default parameters. Based on the spectrograms, we acknowledge the potential role of consonants in accounting for differences in stimuli. Consequently, we have included this consideration as one of the limitations of our study in this revised version (lines 325-330).

      Another critical detail that the authors need to include about the signals is an explanation of how the control signals were generated. The text states that the Fo and amplitude envelope of the vocal signals were mimicked in the control signals, but what was the signal used for the controls? Was a pure tone complex modulated, or was pink noise used to generate the control signals? Or were the original vocal signals simply filtered in some way to create the controls, which would preserve the Fo and amplitude envelope? If merely filtered, the control signals still may be perceived as 'vocal' signals, rather than as nonspeech (the Supplement contains the sounds, and some of the control sounds can be perceived, to my ear, as 'vocal' signals).

      We sincerely appreciate your attention to detail regarding the generation of control signals. As a non-specialized laboratory in audio editing, our approach involved filtering the original vocal sounds around the fundamental frequency (f0) and ensuring a balanced mean intensity between vocal and nonvocal stimuli (as now stated in lines 432-437). However, it became evident that certain “vocal” components persisted in the control sounds, particularly noticeable in the sound “tutu”. In this revision, we openly acknowledge this oversight (lines 331-333). We extend our gratitude once again for highlighting the importance of meticulous consideration when generating control sounds for a study.

      Second, there is no information in the manuscript or supplement about the auditory environment of the participants, nor discussion of the fetus' ability to hear in the womb. In the womb, infants are listening to the mothers' bone-conducted speech (which is full of consonant sounds), and we know from published studies that infants can discern differences not only in the prosody of the speech they hear in the womb, but the phonetic characteristics of the mother's speech. The ability at 37 weeks GA or beyond to discriminate the pitch changes in the vocal, but not control signals, could thus be due to additional experience in utero to speech. Another experiential explanation is that the infants born at 37 weeks GA and beyond may be exposed to greater amounts of speech after birth, when compared to those born at 35 and 36 weeks GA, from the attending nurses and from their caregivers, and this speech is also full of consonant sounds. What these infants hear is likely to be 'infant-directed speech,' which is significantly higher in pitch, mirroring the signals tested here. At 37 weeks GA, infants are likely more robust, may sleep less, and are likely more alert. If infants' exposure to speech, either after birth, or their auditory ability to discern differences in speech in utero, is enhanced at 37 weeks GA and beyond, then an 'experience-related' explanation is a viable alternative to a maturational explanation, and should be discussed. Perhaps both are playing a role. As the authors state, many more signals need to be tested to discern how the effect should be interpreted, and other viable interpretations of the current results discussed.

      We acknowledge the importance of considering the auditory environment of participants and the fetus' ability to hear in the womb. In our study, neonates were exposed to a native language environment both before and after birth (as added in lines 385-386), and we took efforts to minimize their exposure to speech stimuli other than those used in the experiment. Specifically, all neonates participated the experiment and underwent EEG recording within the first 24 hours after birth (lines 386-387). They were promptly transported to a dedicated testing room for EEG recording as soon as their condition stabilized after birth. During recording sessions, they were separated from their mothers to minimize exposure to natural speech (as added in lines 459-461). As a result, we believe that both preterm and term neonates were exposed to comparable amounts of speech after birth and before the experiment. We also ensured that all participants were in a natural sleep state during EEG recording. However, it is possible that term neonates slept less and were more attentive to the limited speech stimuli in their environment before the experiment compared to preterm newborns.

      The debate surrounding nature versus nurture in neonate and infant development persists. We recognize the potential impact of prenatal auditory experiences on neonatal perceptual sensitivity. Therefore, we have added a brief discussion regarding innate- or experience-related explanations for emotional prosodic discrimination in neonates, aiming to shed light on future research directions (lines 343-351).

    1. Author response:

      The following is the authors’ response to the previous reviews

      It is unclear to us why you did not adjust the title to better reflect the well-supported claims of the paper, i.e., that this is a valuable model for human loss-of-function mutations in IQCH.

      Thanks for the editor’s suggestion. We have changed the title to “Deficiency of IQCH causes male infertility in humans and mice.” Additionally, we have provided the original images of the gels or blots as a zipped folder.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors explore ER stress signalling mediated by ATF6 using a genome-wide gene depletion screen. They find that the ER chaperone Calreticulin binds and directly represses ATF6; this proposed function for Calreticulin is intriguing and constitutes an important finding. The evidence presented is based on CHO genetic evidence and biochemical results and is convincing. 

      We thank the editors for their favourable assessment of our work.

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Tung and colleagues identify Calreticulin as a repressor of ATF6 signalling using a CRISPR screen and characterize the functional interaction between ATF6 and CALR. 

      Strengths: 

      The manuscript is well written and interesting with an innovative experimental design that provides some new mechanistic insight into ATF6 regulation as well as crosstalk with the IRE1 pathway. The methods used were fit for purpose and reasonable conclusions were drawn from the data presented. Findings are novel and bring together glycoprotein quality control and activation of one sensor of the UPR. This is a novel perspective on how the integration of ER homeostasis signals could be sensed in the ER. 

      We thank the reviewer for their favourable assessment of our work.

      Weaknesses: 

      Several points remain to be documented to support the authors' model. 

      Major comments 

      (1) It is interesting that BiP, PDIs, and COPII are not identified in the screen. Might this indicate some bias in the system perhaps limiting its sensitivity or pleiotropic effects of the reporter? 

      The reviewer raises a valid concern. Our CRISPR screen aimed to identify genes that selectively modulate ATF6⍺. Therefore, we excluded from consideration genes whose inactivation had effects on the broader ER environment. This would disfavour the selection of genes encoding BiP, PDI and COPII components. Additionally, a positive selection screen inherently removes essential genes like BiP. The absence of COPII components among the hits could be due to essentiality or that those components are not strong selective modulators for ATF6⍺ activation, as the stronger ATF6⍺ modulators as S1P, S2P and transcription factor S2P and NFY were among our top hits. Cell type specificity may also play a role. For example, ERp18, a small PDI previously implicated in ATF6⍺ activation (Oka et al 2019; PMID: 31368601), despite the presence of sgRNAs targeting hamster ERp18 in the library. Interestingly, depletion of ERp18 in our dual UPR reporter CHO-K1 cell line did not affect the ATF6⍺ and IRE1⍺ UPR branches in CHO-K1 cells. This new information has been incorporated into the revised manuscript as Supplemental Figure S6E and the discussion has been edited in line with these comments.

      (2) CLR interacts with ATF6 independently of ATF6 glycans (and cysteines). How do the authors reconcile this observation with the lectin functions of CALR? What is the interaction mode then - if the CALR N (lectin) domain is not involved, is it the P domain that is responsible for the interaction? All the binding experiments are performed in the presence of 1 mM CaCl2, is calcium necessary for CALR to achieve binding? 

      These points merit clarification. The Biolayer Interferometry (BLI) assay reported on an interaction between ATF6 and CRT that is independently of ATF6⍺ glycans. However, cellbased experiments revealed a contribution of glycan-dependent interactions to the binding and repression. Therefore, we conclude that the interaction of CRT with ATF6⍺ likely involves both lectin-dependent and lectin-independent interactions (dependent on the P-domain). Indeed, this hybrid model has previously been suggested as the mode of stable interaction of CRT with other substrates, as cited in the discussion section (Wijeyesakere et al., 2013; PMID: 24100026). CRT is a known calcium-dependent protein, and all the in vitro experiments were conducted in the presence of 1 mM CaCl2. We do not have data from experiments without CaCl2.

      (3) Does the introduction of the reporter system affect the normal BiP (or ATF6) protein levels in the cells? 

      To address this question, we have conducted new experiments comparing endogenous BiP protein levels between the reporter-containing cells and the parental CHO-K1 cells using immunoblotting and an anti-BiP antibody. These data indicate that the reporter system does not affect to the endogenous BiP protein levels. This new information has been incorporated as revised Supplemental Figure S1C.

      (4) Does the depletion of CRT affect BiP interaction with ATF6? The absence of CRT may lead to misfolding of glycoproteins and titration of BiP away from ATF6 leading to activation. An indicator of ER stress levels that is independent of ATF6 and IRE1 might be useful. 

      To further assess ER stress levels in CRT-depleted cells, we compared expression levels of endogenous ER resident proteins containing a KDEL signal (e.g., P3H1, GRP94, BiP and PDI) in parental CHO-K1 cells, dual UPR reporter cell lines (XC45-6S) and CRT-depleted cells (CRT∆#2P) under basal conditions and during ER stress by immunoblotting. This comparison confirmed the basal elevation in BiP protein level in cells lacking CRT, consistent with previous findings (Figure 2D) and more broadly the integrity of UPR signalling in cells lacking CRT. In the interest of time, we did not extend the analysis to other branches of the UPR. This new information has been incorporated as Supplemental Figure S5 and in the text of the revised manuscript.

      (5) Does CALR depletion alter ATF6 redox status. 

      We thank the reviewer for raising this interesting point. In response, we compared ATF6⍺ redox status in parental and CRT-depleted cells using non-reducing SDS-PAGE. Overall, the redox pattern was similar in parental and CRT-depleted cells with the detection of two redox forms: an inter-chain disulfide-stabilised dimer and the monomer. Under basal conditions, ATF6⍺ predominantly existed as a monomer, while under ER stress, the monomer band decreased with a corresponding increase in a disulfide-stabilised dimer form in parental cells, as previously reported (Oka et al, 2022; PMID: 35286189). However, under ER stress, CRTdepleted cells showed a significantly higher fraction of monomer versus dimer compared to parental cells. Taking all together, these data suggest that the loss of CRT may favour the monomeric form of ATF6α, which is proposed to be more efficiently trafficked (Nadanaka, et al 2007; PMID: 17101776), aligning with our observations that CRT depletion is associated to constitutive activation of ATF6α. These new data have been included as Supplemental Figure S7 and are detailed explained in the results section of the revised manuscript.

      (6) Figure 4C would benefit from some immunoblotting against BiP.

      Although we acknowledge the validity of this suggestion and understand the referee's interest in comparing the amount of CRT in pulldown with that of BiP, the necessity of generating additional samples makes this experiment impractical. Consequently, we opted not to include in our conclusion any comparison regarding the retention of ATF6α by BiP relative to CRT.

      (7) Overlooked requirement of cysteines for ATF6 functionality (Figure 5B). 

      We interpret this comment to refer to the inactivity of the cysteine-free allele of ATF6⍺. Whilst this is a reproducible observation of significance to the structure-activity features of ATF6⍺’s luminal domain, it is less informative in terms of understanding trans-active regulators of ATF6⍺ and was therefore not explored further.

      (8) Without a clear definition of the role of CRT in ATF6 folding, one cannot infer that the observed phenotype is not based on defects in ATF6 "folding" and glycosylation considering the possibility of activation of newly synthesised un-glycosylated ATF6. 

      If the main role of CRT were to assist ATF6⍺ folding, one would expect that depletion of CRT would lead to a non-functional ATF6⍺, resulting in ER retention and less activity. However, our data indicate that the loss of CRT correlates with the constitutive activation of the ATF6⍺ fluorescent reporter and increased Golgi trafficking and processing of ATF6⍺. Therefore, these data suggest that in CRT-depleted cells, the majority of ATF6⍺ is likely to fold to a functional state.

      (9) ATF6 was defined in several studies as a natively unstable protein and shows a close relationship with the ERAD machinery, is the role of CALR also involved in a quality control mechanism for natively unfolded ATF6? 

      The reviewer brings up a valid point too. Although we have not closely evaluated the role of CRT in the quality control machinery, we observed that the loss of CRT was not associated with an increased levels of ATF6⍺ in CRT depleted cells in basal conditions compared with parental cells (Fig 3B.1, compare line 1 and line 7; Figure 3B.2, compare line 1 and line 5). These observations suggest that if ATF6⍺ were degraded by ERAD and loss of CRT compromised ERAD functionality, CRT-depleted cells should exhibit increased levels of endogenous ATF6⍺. The fact that endogenous ATF6⍺ levels are slightly reduced in CRT depleted cells does not support a role for CRT in the quality control mechanism for natively unfolded ATF6⍺.

      (10) C618 in ATF6 is located within the BiP binding site and in close proximity of an Nglycosylation site. Is this region of particular importance for CALR binding? 

      It is an interesting point that we have not explored in this study. Consequently, without experimental data, we cannot infer the possible implications of C618 in CRT binding.

      (11) The authors have mutated all the N glycosylation sites at once; they should be mutated one by one and the impact on ATF6 stability evaluated independently of the CALR status. 

      We agree that analysing each N-glycosylation site individually would provide further insight into their contributions to ATF6⍺ stability/functionality. However, given the scope of the paper in its present form we have elected not to addressing this point.

      (12) The relationship between the absence of CALR and IRE1 remains weak. The authors do not exclude the possibility that CALR could have a direct effect on IRE1 itself. This should be either removed or further investigated. 

      We beg to differ. The relationship between the absence of CRT and IRE1 is not weak; loss of CRT in CHO-K1 cells represses IRE1; we conceded readily that the relationship is incompletely understood. ATF6⍺ signalling involves crosstalk with the IRE1 pathway, partly mediated by direct heterodimerisation of N-ATF6⍺ with XBP1s (Yamamoto et al., 2007, 2004). Additionally, recent research has shown that ATF6⍺ activity can repress IRE1 signalling (Walter et al., 2018). Therefore, given that our results indicate that the loss of CRT leads to constitutive activation of ATF6⍺, we suggest that a negative feedback loop in which ATF6⍺ represses IRE1 contributes to the observations made here on the relationship between CRT and IRE1. This does not exclude other aspects to the relationship, a point that is now clarified further in the revised manuscript. 

      Minor point 

      In the introduction on page 3 it is mentioned that loss of ATF6 impairs survival in cellular and animal models, this is not completely true as ATF6a ko in mice has no clear deleterious phenotype and only the double ko ATF6a/b has some dramatic impact.

      We have modified that sentence on the revised manuscript. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, the authors set out to use an unbiased CRISPR/Cas9 screen in CHO cells to identify genes encoding proteins that either increase or repress ATF6 signalling in CHO cells. 

      Strengths: 

      The strengths of the paper include the thoroughness of the screens, the use of a novel, double ATF6/IRE1 UPR reporter cell line, and follow-up detailed experiments on two of the findings in the screens, i.e. FURIN and CRT, to test the validity of involvement of each as direct regulators of ATF6 signalling. Additional strengths are the control experiments that validate the ATF6 specificity of the screens, as well as, for CRT, the finding of focus, determining roles for the glycosylation and cysteines in ATF6 as mechanistically involved in how CRT represses ATF6, at least in CHO cells. 

      We thank the reviewer for their favourable assessment of our work.  

      Weaknesses: 

      (1) The weaknesses of the paper are that the authors did not describe why they focused only on the top 100 proteins in each list of ATF6 activators and repressors. 

      We concede that the more genes one studies the better. However, In whole genome CRISPR screens where thousands of hits arise, it is a common practise that researchers prioritise candidates with the greatest significant as those genes are likely to have a more meaningful impact on the phenotype under investigation. Therefore, our decision to focus on the top 100 genes was based on a desire to identify the most prominent and potentially impactful candidates for further analysis, ensuring a manageable scope for in-depth study while maintaining a measure of relevance and significance. Moreover, setting the threshold at 100 hits to perform GEO enrichment analysis is a practise used by previous researchers (PMID: 30323222; PMID: 37251921). In our case, the top 100 hits included the genes with an adjusted P < 0.005. For interested readers, the full ranked list is accessible in the GEO databank (GSE254745) and as supplemental Table S1.

      (2) Additionally, there were a few methodology items missing, such as the nature of where the insertion site in the CHO cell genome of the XBP1::mCherry reporter. Since the authors go to great lengths to insert the other reporter for ATF6 activation in a "safe harbor" location, it leads to questions about whether the XBP1::mCherry reporter insertion is truly innocuous. 

      We appreciate the opportunity to clarify certain aspects of our experimental procedures. In order to generate a double UPR reporter cell line, we employed a previously established the XC45 CHO-K1 clone with an integrated XBP1s::mCherry reporter (Harding et al., 2019; PMID: 31749445). Since the ROSA26 safe harbor locus was available in the XC45 CHO-K1 cell line, we directed integrated the ATF6⍺ reporter there. To provide further clarity, the revised manuscript includes additional details in the Methods section regarding the creation of the XBP1 reporter.

      (3) An additional weakness is that the evidence for the physical interaction between ATF6LD and CRT is not strong, being dependent mainly on a single IP/IB experiment in Figure 4C that comprises only 1 lane on the gel for each of the test cases. Moreover, while that figure suggests that the interaction between CRT and ATF6 is decreased by mutating out the glycosylation sites in the ATF6LD, the BLI experiment in the same figure, 4B, suggests that there are no differences in the affinities of CRT for ATF6LD WT, deltaGly and deltaCys. 

      We would like to highlight that in the IP/IB experiments (see Figure 4C), where wildtype ATF6 (ATF6⍺_LDWT) and GFP-ATF6_LD∆Gly were transiently transfected, GFP-ATF6_LD∆Gly was expressed at lower levels than ATF6⍺_LDWT. This lower expression levels might explain why CRT is more prominently immunoprecipitated with ATF6⍺_LDWT and could account for the differences observed among in vitro and in vivo assays.

      (4) An additional detail is that I found Figure 6A to be difficult to interpret, and that 6B was required in order for me to best evaluate the points being made by the authors in this figure. 

      We have simplified Figure 6A in the revised manuscript to make it more interpretable by focussing the reader’s attention on the transfected population. 

      Overall, I believe that this work will positively impact the field as it provides a list of potential regulators of ATF6 activation and repression that others will be able to use as a launch point for discovering such interactions in cells and tissues or interest beyond CHO cells. However, I agree with the authors that these findings were in CHO cell lines and that it is possible, if not likely, that some of the interactions they found will be cell type/line specific. 

      We accept this point and re-emphasize the qualification that our conclusions cannot be glibly extrapolated to other cell lines.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      The goal of the current study was to evaluate the effect of neuronal activity on blood-brain barrier permeability in the healthy brain, and to determine whether changes in BBB dynamics play a role in cortical plasticity. The authors used a variety of well-validated approaches to first demonstrate that limb stimulation increases BBB permeability. Using in vivo-electrophysiology and pharmacological approaches, the authors demonstrate that albumin is sufficient to induce cortical potentiation and that BBB transporters are necessary for stimulus-induced potentiation. The authors include a transcriptional analysis and differential expression of genes associated with plasticity, TGF-beta signaling, and extracellular matrix were observed following stimulation. Overall, the results obtained in rodents are compelling and support the authors' conclusions that neuronal activity modulates the BBB in the healthy brain and that mechanisms downstream of BBB permeability changes play a role in stimulus-evoked plasticity. These findings were further supported with fMRI and BBB permeability measurements performed in healthy human subjects performing a simple sensorimotor task. There is literature to suggest that there are sex differences in BBB dysfunction in pathophysiological conditions and the authors have acknowledged the use of only males as a minor limitation of the study that should be addressed in the future. Future studies should also test whether the upregulation of OAT3 plays a role in cortical plasticity observed following stimulation. Overall, this study provides novel insights into how neurovascular coupling, BBB permeability, and plasticity interact in the healthy brain. 

      Reviewer #2 (Public Review): 

      Summary: 

      This study builds upon previous work that demonstrated that brain injury results in leakage of albumin across the blood brain barrier, resulting in activation of TGF-beta in astrocytes. Consequently, this leads to decreased glutamate uptake, reduced buffering of extracellular potassium and hyperexcitability. This study asks whether such a process can play a physiological role in cortical plasticity. They first show that stimulation of a forelimb for 30 minutes in a rat results in leakage of the blood brain barrier and extravasation of albumin on the contralateral but not ipsilateral cortex. The authors propose that the leakage is dependent upon neuronal excitability and is associated with an enhancement of excitatory transmission. Inhibiting the transport of albumin or the activation of TGF-beta prevents the enhancement of excitatory transmission. In addition, gene expression associated with TGF-beta activation, synaptic plasticity and extracellular matrix are enhanced on the "stimulated" hemisphere. That this may translate to humans is demonstrated by a break down in the blood brain barrier following activation of brain areas through a motor task. 

      Strengths: 

      This study is novel and the results are potentially important as they demonstrate an unexpected break down of the blood brain barrier with physiological activity and this may serve a physiological purpose, affecting synaptic plasticity. 

      The strengths of the study are: 

      (1) The use of an in vivo model with multiple methods to investigate the blood brain barrier response to a forelimb stimulation. 

      (2) The determination of a potential functional role for the observed leakage of the blood brain barrier from both a genetic and electrophysiological view point 

      (3) The demonstration that inhibiting different points in the putative pathway from activation of the cortex to transport of albumin and activation of the TGF-beta pathway, the effect on synaptic enhancement could be prevented.  (4) Preliminary experiments demonstrating a similar observation of activity dependent break down of the blood brain barrier in humans. 

      Weaknesses: 

      The authors adequately addressed most of my points. A few remain: 

      (1) Although the reviewers have addressed the possible effects of anaesthesia on neuro-vascular coupling. They have not mentioned or addressed the possible effects of ketamine (an NMDA receptor antagonist) on synaptic plasticity. Indeed, the low percentage of SEP increase following potentiation (10-20%) could perhaps be explained by partial block of NMDA receptors by ketamine.

      We agree and apologize for this oversight. This important issue is now addressed in the Discussion.

      “Notably, the antagonistic effect of ketamine on NMDA receptors might attenuate the magnitude of SEP potentiation recorded in our experiments (Anis et al., 1983; Salt et al., 1988).”

      (2) The experimental paradigms remain unclear to me. Now, it appears that drugs are applied for 50 minutes and that the stimulation occurs during the "washout period". The more conventional approach would be to have the drug application during the stimulation period to determine if the drugs occlude or enhance the effects of stimulation and then washout the drugs. The problem is that drugs variably washout at different rates depending upon their lipid solubility.

      We agree that the more conventional approach would have been to continue applying the drug throughout the experiment and that differential rates of washout may add variability to our experiments. However, despite this limitation, within each treatment group we found that the SEP response at 50 minutes (immediately after the drug application window) does not differ from SEP response at 80 minutes (after 30 minutes of stimulation and washout) [Figure 3H&G]. This suggests that the drug effects were still present despite terminating drug application and performing potentiation-inducing stimulation. Moreover, our analysis showed that animals within each treatment group (except AP5) had similar SEP responses with little intra-group variability.

      (3) It is still not clear to what extent the experimenters and those doing the analysis were blinded to group. If one or both were blind to group, then please put this in the methods.

      Thank you for this comment. We revised the Methods section to clearly confirm that data was collected and analyzed blindly.  

      Reviewer #3 (Public Review): 

      Summary: 

      This study used prolonged stimulation of a limb to examine possible plasticity in somatosensory evoked potentials induced by the stimulation. They also studied the extent that the blood brain barrier (BBB) was opened by the prolonged stimulation and whether that played a role in the plasticity. They found that there was potentiation of the amplitude and area under the curve of the evoked potential after prolonged stimulation and this was long-lasting (>5 hrs). They also implicated extravasation of serum albumin, caveolae-mediated transcytosis, and TGFb signalling, as well as neuronal activity and upregulation of PSD95. Transcriptomics was done and implicated plasticity related genes in the changes after prolonged stimulation, but not proteins associated with the BBB or inflammation. Next, they address the application to humans using a squeeze ball task. They imaged the brain and suggest that the hand activity led to an increased permeability of the vessels, suggesting modulation of the BBB. 

      Strengths: 

      The strengths of the paper are the novelty of the idea that stimulation of the limb can induce cortical plasticity in a normal condition, and it involves opening of the BBB with albumin entry. In addition, there are many datasets and both rat and human data. 

      Weaknesses: 

      The conclusions are not compelling however because of a lack of explanation of methods.

      In the revised paper, we added a section titled ‘study design’ that presents an overview of the experimental approach.

      The explanation of why prolonged stimulation in the rat was considered relevant to normal conditions should be as clear in the paper as it is in the rebuttal.

      We added a new paragraph to the Discussion section explaining this point as we did in the rebuttal:  

      “Our animal experiments show that a 30 min limb stimulation (at 6Hz and 2mA) increases cross-BBB influx, while a 1 min stimulation (of similar frequency and magnitude) does not. We believe that both types of stimulations fall within the physiological range because our continuous electrophysiological recordings showed no signs of epileptiform or otherwise pathological activity. Moreover, the recorded SEP levels were similar to those reported in previous physiological LTP studies in rats (Eckert & Abraham, 2010; Han et al., 2015; Mégevand et al., 2009) and humans (McGregor et al., 2016). In humans, skill acquisition often involves motor training sessions that last ≥30 minutes (Bengtsson et al., 2005; Classen et al., 1998) and result in physiological plasticity of sensory and motor systems (Classen et al., 1998; Draganski et al., 2004; Sagi et al., 2012). Hence, the experimental task in our human study (30 minutes of repetitive squeezing of an elastic stress-ball) is likely to represent physiological activity, with neuronal activation in primarily motor and sensory areas (Halder et al., 2005). Future human and animal studies are needed to explore the BBB modulating effects of additional stimulation protocols – with varying durations, frequencies, and magnitudes. Such studies may also elucidate the temporal and ultrastructural characteristics that differentiate between physiological and pathological BBB modulation. “

      The authors need to ensure other aspects of the rebuttal are as clear in the paper as in the rebuttal too. 

      Thank you for this comment. This was addressed in the revised paper.

      The only remaining concern that is significant is that it is hard to understand the figures. 

      Thank you for this comment. We revised the figures according to the reviewer’s recommendations. We hope that these changes increase the legibility of the figures. 

      Reviewer #3 (Recommendations For The Authors): 

      The manuscript is improved but there are still suggestions that do not appear to have been addressed. More experiments are not involved in addressing these concerns but one wants the paper to be clarified in terms of what was done. 

      Figures. Please use arrows to point to the effect that the reader should see. Please note what the main point is. 

      Major concerns: 

      Please add explanations, exact p values, and other revisions in the rebuttal to the paper. 

      Rebuttal explanations were added to the paper and p values appear in figure legends.

      Fig 1d shows a seizure-like event which the authors don't think is a seizure because it lacks a depolarization ship. This explanation is not convincing because a LFP would not necessarily show a depolarization ship. Another argument of a discussion of the event as a seizure is warranted. Note that expanding the trace might also show it is unlike a seizure. Regarding the idea that 6Hz 2 mA stimuli for 30 min are physiological, the authors make three arguments which are not clear. First, no epileptiform activity was found, but in Fig. 1 it looks like a seizure occurred. Second, memory and skill acquisition in humans open involve a similar training duration - but what about 6Hz 2 mA?

      Rats are known to rhythmically move their whiskers at frequencies ranging between 5 and 15 Hz (Mégevand et al., 2009). We agree that there is no clear way to justify the similarity between the experimental design in humans and rats. However, we believe that both paradigms (paw stimulation in rats and ball squeeze in humans) represent non-pathological input that we found to modulate barrier permeability. This argument was added to the discussion of the paper:

      “We believe that both types of stimulations fall within the physiological range because in rats, activity between 515 Hz represents physiological rhythmic whisker movement during environment exploration (Mégevand et al., 2009).” 

      Seizures are typically induced in rats via direct tetanic stimulation of the brain (at 50 Hz and 0.3-2.5mA) or maximal electroshock test to the cornea (at 50 Hz and 150 mA) (Swinyard et al., 1952). We, therefore, assert that the activity we observe represents physiological responses and not seizures. This argument is beyond the scope of the current paper. 

      Please note a limitation is that the high level of serum albumin is unlikely to be physiological but may not have been as high in the animal because of the low diffusion rate and degradation (please add the refs in the rebuttal). 

      Thank you, we added the following to the Results section: 

      “The relatively high concentration of albumin was chosen to account for factors that lower its effective tissue concentration such as its low diffusion rate and its likelihood to encounter a degradation site or a cross-BBB efflux transporter (Tao & Nicholson, 1996; Zhang & Pardridge, 2001).”

      Fig. 1. 

      Please consider a box in b to show where the expanded traces in the lower row came from. 

      Thank you for the suggestion. We added lines indicating where the trace excerpts were taken from.

      c. Please use arrows to point to the parts that the authors want the reader to note. In the legend, explain what t is, and delta HbT.

      Thank you. We implemented this suggestion.

      d. It is not clear what the double-sided arrows are meant to show compared to the arrow without two sides. 

      We replaced the two-headed arrow with two single ones.

      e. Please explain what the upward lines at the top signify. What does the red asterisk mean? 

      Thank you. We implemented this suggestion.

      f. Is the reader supposed to note the yellow area? Please make it with an arrow or circle if so. 

      Thank you, we added a white circle to mark the area of tracer accumulation.

      g. Please explain what the permeability index is or reference the part of the paper that does. 

      Further to this suggestion, we added a refence to the appropriate methods section to the legend.

      h. Please use arrows to point to the area of interest. 

      Thank you. We implemented this suggestion.

      m-n. Please mark areas of interest with arrows.  m. the top right two images are unclear. I suggest making them say ipsi inset and contra inset instead of using asterisks. 

      Thank you. We added the ipsi and contra labels to panels in m. The images in panel n represent a phenomenon with no particular region of interest, but rather peri-vascular tracer accumulation along the entire depicted blood vessel. We clarified that panel n represents a separate experiment than panel m: “n. In an animal injected with both EB and NaFlu post stimulation, fluorescence imaging shows extravascular accumulation of both tracers along a cortical small vessel in the stimulated hemisphere.”

      Figure 2. 

      (2) a. Middle. What are the vertical lines at the top? The rebuttal states that was explained in the revised legends but I don't see it. 

      Our apologies. We now included an explanation that “an excerpt of the stimulation trace is shown above the middle LFP trace”.

      c and d are very different field potentials in shape and therefore hard to compare. The rebuttal addresses this but the explanation is not in the revised text. 

      We agree that there is variability in SEP responses between animals. We now added a statement acknowledging this in the methods section: “To overcome potential variability in SEP morphology between animals (Mégevand et al., 2009), each animal’s plasticity measures (max amplitude and AUC of post stimulation SEP) were compared to the same measures at baseline.” 

      In d, it is not clear there is potentiation because the traces are not aligned. 

      All panels depicting SEP traces represent raw data with no alignment. The shift observed in panel d exemplifies why we compare post-stimulation parameters of max amplitude and area under curve to baseline in each animal. 

      Exact P values are said to have been added in the rebuttal but they were not. 

      Exact P values appear in Figure legends.

      (3) b. Use arrows to mark the area of interest. 

      Thank you. We added a white circle to mark the area of tracer accumulation similar to Figure 1f.

      d. Why is there an oscillation superimposed on all traces except CNQX? 

      We agree that this is an interesting question. Future studies should determine the source of this SEP pattern.   

      (4) What does the line and the number 2 mean? How were data normalized? What was counted? What area of cortex?

      The number 2 refers to the scale bar line, meaning a log fold change of 2 reflects the size of the scale bar line. 

      The plot shows the log fold change against the mean count of each gene in the contralateral somatosensory cortex between 1 and 24 hours after stimulation.

      The x axis title was changed to “mean expression” and the legend was modified to:

      “Scatter plot of gene expression from RNA-seq in the contralateral somatosensory cortex 24 vs. 1 h after 30 min stimulation. The y axis represents the log fold change, and the x axis represents the mean expression levels (see methods, RNA Sequencing & Bioinformatics). Blue dots indicate statistically significant differentially expressed genes (DEGs) by Wald Test (n=8 rats per group).”

      How were the pericytes, smooth muscle cells, ,etc. distinguished? 

      This was explained under Methods->RNA Sequencing & Bioinformatics: “Analysis of cell-specific and vascular zonation genes was performed as described (Vanlandewijck et al., 2018), using the database provided in (http://betsholtzlab.org/VascularSingleCells/database.html).”

      What were the chi square statistics? If there were cells used instead of rats, please justify. 

      Thank you. The legend was expanded to include the following:

      “The contralateral somatosensory cortex was found to have a significantly higher number of DEGs related to synaptic plasticity, than the ipsilateral side (***p<0.001, Chi-square).”     

      (5) b. what do the icons mean? 

      We agree that the icons were confusing. We simplified this panel to just show when participants were asked to squeeze the ball (black icon). This explanation was added to the Figure legend.

      Abbreviations? 

      Abbreviations of MRI protocols were added to the figure legend for clarity.

      In c-e what are the units of measure? Fold-change? 

      The units represent t-statistics values for each voxel. The label ‘t-statistic’ was added to the figure.  

      What are the white Iines, + and - signs? 

      The white lines point to voxels of highest activation (t-statistic). This was added to the legend.

      And these are not +/- signs these are voxels with significant activation which only appear similar.

      f. Please explain f and g for clarity. 

      Thank you. The explanation was modified for added clarity.

      Supplemental Fig. 4. 

      Original question: If ipsilateral and contralateral showed many changes why do the authors think the effects were only contralateral? 

      The authors replied: Our gene analysis was designed to complement our in vivo and histological findings, by assessing the magnitude of change in differentially expressed genes (DEGs). This analysis showed that: (1) the hemisphere contralateral to the stimulus has significantly more DEGs than the ipsilateral hemisphere; and (2) the DEGs were related to synaptic plasticity and TGF-b signaling. These findings strengthen the hypothesis raised by our in vivo and histological experiments. 

      Could the authors clarify the answer to the question in the text? 

      Thank you. This section was added to the Discussion. 

      Papers referenced in this letter:

      Anis, N. A., Berry, S. C., Burton, N. R., & Lodge, D. (1983). The dissociative anaesthetics, ketamine and phencyclidine, selectively reduce excitation of central mammalian neurones by N-methyl-aspartate. British Journal of Pharmacology, 79(2), 565–575. hQps://doi.org/10.1111/j.1476-5381.1983.tb11031.x

      Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience, 8(9), 1148–1150. hQps://doi.org/10.1038/nn1516

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. Journal of Neurophysiology, 79(2), 1117–1123. hQps://doi.org/10.1152/JN.1998.79.2.1117/ASSET/IMAGES/LARGE/JNP.JA47F4.JPEG

      Draganski, B., Gaser, C., Busch, V., Schuierer, G., Bogdahn, U., & May, A. (2004). Changes in grey matter induced by training. Nature, 427(6972), 311–312. hQps://doi.org/10.1038/427311a

      Eckert, M. J., & Abraham, W. C. (2010). Physiological effects of enriched environment exposure and LTP induction in the hippocampus in vivo do not transfer faithfully to in vitro slices. Learning and Memory, 17(10), 480–484. hQps://doi.org/10.1101/lm.1822610

      Halder, P., Sterr, A., Brem, S., Bucher, K., Kollias, S., & Brandeis, D. (2005). Electrophysiological evidence for cortical plasticity with movement repetition. European Journal of Neuroscience, 21(8), 2271–2277. hQps://doi.org/10.1111/J.1460-9568.2005.04045.X

      Han, Y., Huang, M. De, Sun, M. L., Duan, S., & Yu, Y. Q. (2015). Long-term synaptic plasticity in rat barrel cortex. Cerebral Cortex, 25(9), 2741–2751. hQps://doi.org/10.1093/cercor/bhu071

      McGregor, H. R., Cashaback, J. G. A., & Gribble, P. L. (2016). Functional Plasticity in Somatosensory Cortex Supports Motor Learning by Observing. Current Biology, 26(7), 921–927. hQps://doi.org/10.1016/j.cub.2016.01.064

      Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M., & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. Journal of Neuroscience, 29(16), 5326– 5335. hQps://doi.org/10.1523/JNEUROSCI.5965-08.2009

      Sagi, Y., Tavor, I., HofsteQer, S., Tzur-Moryosef, S., Blumenfeld-Katzir, T., & Assaf, Y. (2012). Learning in the Fast Lane: New Insights into Neuroplasticity. Neuron, 73(6), 1195–1203. hQps://doi.org/10.1016/j.neuron.2012.01.025

      Salt, T. E., Wilson, D. G., & Prasad, S. K. (1988). Antagonism of N-methylaspartate and synapBc responses of neurones in the rat ventrobasal thalamus by ketamine and MK-801. British Journal of Pharmacology,

      94(2), 443–448. hQps://doi.org/10.1111/j.1476-5381.1988.tb11546.x

      Swinyard, E. A., Brown, W. C., & Goodman, L. S. (1952). Comparative assays of antiepileptic drugs in mice and rats. The Journal of Pharmacology and Experimental Therapeutics, 106(3), 319–330. hQp://jpet.aspetjournals.org/content/106/3/319.abstract

      Tao, L., & Nicholson, C. (1996). Diffusion of albumins in rat cortical slices and relevance to volume transmission. Neuroscience, 75(3), 839–847. hQps://doi.org/10.1016/0306-4522(96)00303-X

      Vanlandewijck, M., He, L., Mäe, M. A., Andrae, J., Ando, K., Del Gaudio, F., Nahar, K., Lebouvier, T., Laviña, B.,

      Gouveia, L., Sun, Y., Raschperger, E., Räsänen, M., Zarb, Y., Mochizuki, N., Keller, A., Lendahl, U., &

      Betsholtz, C. (2018). A molecular atlas of cell types and zonation in the brain vasculature. Nature, 554(7693), 475–480. hQps://doi.org/10.1038/nature25739

      Zhang, Y., & Pardridge, W. M. (2001). Mediated efflux of IgG molecules from brain to blood across the blood– brain barrier. Journal of Neuroimmunology, 114(1–2), 168–172. hQps://doi.org/10.1016/S01655728(01)00242-9

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Wang, He et al have constructed comprehensive single nucleus atlas for the gills of the deep sea Bathymodioline mussels, which possess intracellular symbionts that provide a key source of carbon and allow them to live in these extreme environments. They provide annotations of the different cell states within the gills, shedding light on how multiple cell types cooperate to give rise to the emergent functions of the composite tissues and the gills as a whole. They pay special attention to characterizing the bacteriocyte cell populations and identifying sets of genes that may play a role in their interaction with the symbiotes. 

      Wang, He et al sample mussels from 3 different environments: animals from their native methane rich environment, animals transplanted to a methane-poor environment to induce starvation and animals that have been starved in the methane-poor environment and then moved back to the methane-rich environment. They demonstrated that starvation had the biggest impact on bacteriocyte transcriptomes. They hypothesize that the up-regulation of genes associated with lysosomal digestion leads to the digestion of the intracellular symbiont during starvation, while the non-starved and reacclimated groups more readily harvest the nutrients from symbiotes without destroying them. Further work exploring the differences in symbiote populations between ecological conditions will further elucidate the dynamic relationship between host and symbiote. This will help disentangle specific changes in transcriptomic state that are due to their changing interactions with the symbiotes from changes associated with other environmental factors. 

      This paper makes available a high quality dataset that is of interest to many disciplines of biology. The unique qualities of this non-model organism and collection of conditions sampled make it of special interest to those studying deep sea adaptation, the impact of environmental perturbation on Bathymodioline mussels populations, and intracellular symbiotes. The authors also use a diverse array of tools to explore and validate their data. 

      Reviewer #2 (Public Review): 

      Wang, He et al. shed insight into the molecular mechanisms of deep-sea chemosymbiosis at the single-cell level. They do so by producing a comprehensive cell atlas of the gill of Gigantidas platifrons, a chemosymbiotic mussel that dominates the deep-sea ecosystem. They uncover novel cell types and find that the gene expression of bacteriocytes, the symbiont-hosting cells, supports two hypotheses of host-symbiont interactions: the "farming" pathway, where symbionts are directly digested, and the "milking" pathway, where nutrients released by the symbionts are used by the host. They perform an in situ transplantation experiment in the deep sea and reveal transitional changes in gene expression that support a model where starvation stress induces bacteriocytes to "farm" their symbionts, while recovery leads to the restoration of the "farming" and "milking" pathways. 

      A major strength of this study includes the successful application of advanced single nucleus techniques to a non-model, deep sea organism that remains challenging to sample. I also applaud the authors for performing an in situ transplantation experiment in a deep sea environment. From gene expression profiles, the authors deftly provide a rich functional description of G. platifrons cell types that is well-contextualized within the unique biology of chemosymbiosis. These findings offer significant insight into the molecular mechanisms of deep-sea host-symbiont ecology, and will serve as a valuable resource for future studies into the striking biology of G. platifrons. 

      The authors' conclusions are generally well-supported by their results. However, I recognize that the difficulty of obtaining deep-sea specimens may have impacted experimental design and no replicates were sampled. 

      It is notable that the Fanmao cells were much more sparsely sampled. It appears that fewer cells were sequenced, resulting in the Starvation and Reconstitution conditions having 2-3x more cells after doublet filtering. These discrepancies also are reflected in the proportion of cells that survived QC, suggesting a distinction in quality or approach. However, the authors provide clear and sufficient evidence via bootstrapping that batch effects between the three samples are negligible. While batch effect does not appear to have affected gene expression profiles, the proportion of cell types may remain sensitive to sampling techniques, and thus interpretation of Fig. S12 must be approached with caution. 

      Reviewer #3 (Public Review): 

      Wang et al. explored the unique biology of the deep-sea mussel Gigantidas platifrons to understand fundamental principles of animal-symbiont relationships. They used single-nucleus RNA sequencing and validation and visualization of many of the important cellular and molecular players that allow these organisms to survive in the deep-sea. They demonstrate that a diversity of cell types that support the structure and function of the gill including bacteriocytes, specialized epithelial cells that host sulfur-oxidizing or methane-oxidizing symbionts as well as a suite of other cell types including supportive cells, ciliary, and smooth muscle cells. By performing experiments of transplanting mussels from one habitat which is rich in methane to methane-limited environments, the authors showed that starved mussels may consume endosymbionts versus in methane-rich environments upregulated genes involved in glutamate synthesis. These data add to the growing body of literature that organisms control their endosymbionts in response to environmental change. 

      The conclusions of the data are well supported. The authors adapted a technique that would have been technically impossible in their field environment by preserving the tissue and then performing nuclear isolation after the fact. The use of single-nucleus sequencing opens the possibility of new cellular and molecular biology that is not possible to study in the field. Additionally, the in-situ data (both WISH and FISH) are high-quality and easy to interpret. The use of cell-type-specific markers along with a symbiont-specific probe was effective. Finally, the SEM and TEM were used convincingly for specific purposes in the case of showing the cilia that may support water movement. 

      The one particular area for future exploration surrounds the concept of a proliferative progenitor population within the gills. The authors recover molecular markers for these putative populations and additional future work will uncover if these are indeed proliferative cells contribute to symbiont colonization. 

      Overall the significance of this work is identifying the relationship between symbionts and bacteriocytes and how these host bacteriocytes modulate their gene expression in response to environmental change. It will be interesting to see how similar or different these data are across animal phyla. For instance, the work of symbiosis in cnidarians may converge on similar principles of there may be independent ways in which organisms have been able to solve these problems. 

      We extend our sincere gratitude to all the reviewers for their positive comments and kind words. We highly value the substantial efforts they made in helping us improve and enhance our manuscript. Additionally, we appreciate the reviewers for pointing out the limitations of our current study, which will guide us in improving our future researches.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This study system is so interesting and this is a truly unique and exciting dataset. Most of my suggestions are aimed at improving readability and making it more accessible for a broader audience, since I predict many fields will find it interesting. 

      Line 60: which species of mussel? Is this the same one? 

      We appreciate the comments from the reviewer. The reference here is to deep-sea bathymodiolin mussels, which, in most cases, possess enlarged gill filaments that accommodate symbionts.

      Line 237-230: citation of previous findings missing 

      We appreciate the comments from the reviewer. After carefully reviewing these paragraphs, we believe that all the previous findings have now been properly cited.

      Line 256: it might be a good idea to give a brief description of what slingshot analysis is here 

      We appreciate the comments from the reviewer. We have revise the corresponding part of our manuscript to make it clear.

      This parts of manscript now reads: “We performed Slingshot analysis, which uses a cluster-based minimum spanning tree (MST) and a smoothed principal curve to determine the developmental path of cell clusters. The re-sult shows that the PEBZCs might be the origin of all gill epithelial cells, including the other two proliferation cells (VEPC and DEPC) and bacteriocytes (Supplementary Fig. S6).” Line 203-207 of the revised manscript.

      Line 289: Wording is a bit confusing- what is meant by morphological analysis?

      We acknowledge that our wording might be a bit confusing here. We are referring to the TEM ultrastructural analysis. Therefore, we have changed “morphological analysis” to “ultrastructural analysis.” Line231 in the revised manuscript.

      Line 351-354: how did you calculate distances? How many dimensions were used? 

      We calculated the centroid coordinates for each cell type in each state on the 2-dimensional UMAP plot (Fig. 6A). Then, for each cell type, we determined the Euclidean distance between the centroid coordinates of each pair of states. We have revised the manuscript with this more detailed description. Line 292-295 of revised manuscript.

      Line 462: identify -> identified 

      We apologize for our mistake and appreciate the reviewer’s kind assistance with proofreading. The typo has been corrected in the new version. Line396 of the revised manscript.

      Line 509: what does the size of the dot represent? 

      In this context, the color and intensity of each dot represent a specific gene’s expression level in the single-cell cluster. The dot size is universal and therefore does not convey a specific meaning.

      Fig 3A: What is the blue cluster highlighted? 

      We apologize for our mistake. The label for the teal box was missed. We have corrected our mistake in the revised manuscript.

      Fig 3K: Wording in key is confusing. 

      We have modified our description of Fiugre 3K in the figure legneds. Now it reads: “Schematic of water flow agitated by different ciliary cell types. The color of arrowheads corresponds to water flow potentially influenced by specific types of cilia, as indicated by their color code in Figure 3A.” Line462-464 in the revised manscript.

      Fig 5B: which population of mussels was used to take these images? 

      These mussels from “Fanmao” (methane rich) site were used to take these images. We have revised our material and methods to make it clear. Line602-603 of the revised manuscript.

      Fig 5E,5G,5H: panels not referenced in text 

      We apologize for our mistake and appreciate the reviewer’s thorough reading. This error has been corrected in the new version of the manuscript. Line233 of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      Minor comments: 

      Fig. 3A - the teal box in the legend lacks a label 

      We apologize for our mistake. The label for the teal box was missed. We have corrected our mistake in the

      Reviewer #3 (Recommendations For The Authors): 

      My enthusiasm for the manuscript remains high and I appreciate the authors care in responding to the various reviewer questions and concerns. 

      In regards to the cell proliferation results, I have modified my public review and look forward to your future work in this area. The data for both pHistone H3 and anti PCNA are compelling! 

      One typo I did catch occurs on line 520. I believe you meant to say "outer" not "otter." 

      We apologize for our mistake and appreciate the reviewer’s kind assistance with proofreading. The typo has been corrected in the new version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Dubicka and co-workers on calcification in miliolid foraminifera presents an interesting piece of work. The study uses confocal and electron microscopy to show that the traditional picture of calcification in porcelaneous foraminifera is incorrect.

      Strengths:

      The authors present high-quality images and an original approach to a relatively solid (so I thought) model of calcification.

      Weaknesses:

      There are several major shortcomings. Despite the interesting subject and the wonderful images, the conclusions of this manuscript are simply not supported at all by the results. The fluorescent images may not have any relation to the process of calcification and should therefore not be part of this manuscript. The SEM images, however, do point to an outdated idea of miliolid calcification. I think the manuscript would be much stronger with the focus on the SEM images and with the speculation of the physiological processes greatly reduced.

      We agree that fluorescence studies presented in the paper are not an unequivocal proof by itself for calcification model utilised by studied Miliolida species. However, fluorescence data combined with SEM studies, especially overlap of the elements that show autofluorescence upon excitation at 405 nm (emission 420–480 nm) and acidic vesicles marked by p_H-_sensitive LysoGlow84, may be a hint indicating ACC-bearing vesicles.

      We will tone down the the physiological interpretation based on fluorescence studies in the revised version of the manuscript.

      Nevertheless, we think that our fluorescent life-imaging experiments provides important observations in miliolida, which is scarce in the existing literature, and therefore are worth being presented as they might be very helpful in better understanding of full calcification model in the future.

      Reviewer #2 (Public Review):

      Summary:

      Dubicka et al. in their paper entitled " Biocalcification in porcelaneous foraminifera" suggest that in contrast to the traditionally claimed two different modes of test calcification by rotallid and porcelaneous miliolid formaminifera, both groups produce calcareous tests via the intravesicular mineral precursors (Mg-rich amorphous calcium carbonate). These precursors are proposed to be supplied by endocytosed seawater and deposited in situ as mesocrystals formed at the site of new wall formation within the organic matrix. The authors did not observe the calcification of the needles within the transported vesicles, which challenges the previous model of miliolid mineralization. Although the authors argue that these two groups of foraminifera utilize the same calcification mechanism, they also suggest that these calcification pathways evolved independently in the Paleozoic.

      We do not argue that Miliolida and Rotallida utilize exactly the same calcification mechanism but the both groups use less divergent crystallization pathways, where mesocrystalline chamber walls are created by accumulating and assembling particles of pre-formed liquid amorphous mineral phase.

      Strengths:<br /> The authors document various unknown aspects of calcification of Pseudolachlanella eburnea and elucidate some poorly explained phenomena (e.g., translucent properties of the freshly formed test) however there are several problematic observations/interpretations which in my opinion should be carefully addressed.

      Weaknesses:

      (1) The authors (line 122) suggest that "characteristic autofluorescence indicates the carbonate content of the vesicles (Fig. S2), which are considered to be Mg-ACCs (amorphous MgCaCO3) (Fig. 2, Movies S4 and S5)". Figure S2 which the authors refer to shows only broken sections of organic sheath at different stages of mineralization. Movie S4 shows that only in a few regions some vesicles exhibit red autofluorescence interpreted as Mg-ACC (S5 is missing but probably the authors were referring to S3). In their previous paper (Dubicka et al 2023: Heliyon), the authors used exactly the same methodology to suggest that these are intracellularly formed Mg-rich amorphous calcium carbonate particles that transform into a stable mineral phase in rotaliid Aphistegina lessonii. However, in Figure 1D (Dubicka et al 2023) the apparently carbonate-loaded vesicles show the same red autofluorescence as the test, whereas in their current paper, no evidence of autofluorescence of Mg-ACC grains accumulated within the "gel-like" organic matrix is given. The S3 and S4 movies show circulation of various fluorescing components, but no initial phase of test formation is observable (numerous mineral grains embedded within the o rganic matrix - Figures 3A and B - should be clearly observed also as autofluorescence of the whole layer). Thus the crucial argument supporting the calcification model (Figure 5) is missing.

      This is correct that we did not observe the initial phase of test formation in vivo. Therefore, it is not our crucial argument supporting novel components of the new calcification model. We suspect that vesicles preparing and transporting Mg-ACC are produced way before their docking and deposition into the new wall, because such seawater vesicles were observed between the chamber formation stages (Goleń and Tyszka, 2024, personal communication based on independent experiments on a closely related miliolid taxon). It means that our in vivo experiments most likely represent a long, dynamic stage of vesicles formation via seawater endocytosis, their modification (incl. Mg-ACC formation) before the stage of exocytosis during the new chamber formation. Our crucial arguments supporting the calcification model come from the SEM imaging of the specimens fixed during chamber formation, as well as from the transparency of the new chamber wall during its progressive calcification.

      There is no support for the following interpretation (lines 199-203) "The existence of intracellular, vesicular intermediate amorphous phase (Mg-ACC pools), which supply successive doses of carbonate material to shell production, was supported by autofluorescence (excitation at 405 nm; Fig. 2; Movies S3 and S4; see Dubicka et al., 2023) and a high content of Ca and Mg quantified from the area of cytoplasm by SEM-EDS analysis (Fig. S6)."

      We used laser line 405nm and multiphoton excitaton to detect ACCs. These wavelengths (partly) permeate the shell to excite ACCs autofluorescence. The autofluorescence of the shells is present as well but not clearly visible in movieS4 as the fluorescence of ACCs is stronger. This may be related to the plane/section of the cell which is shown. The laser permeates the shell above the ACCs (short distance) but to excite the shell CaCO3 around foraminifera in the same three-dimensional section where ACCs are shown, the light must pass a thick CaCO3 area due to the three-dimensional structure of the foraminiferan shell. Therefore, the laser light intensity is reduced. In a revised version a movie/image with reduced threshold is shown.

      Author response image 1.

      Autofluorescence image of studied Miliolida species (exc. 405 nm) showing algal chlorophyll (blue) and CaCO3 (red), both ACC and calcite shell.

      It would be very convenient if it was possible to visualize ACC by illumination with a blacklight, but there are very many organic molecules that have an autofluorescence excited by ~405 nm. One of the examples is NADH (Lee et al., 2015. Kor J Physiol Pharmac 19(4): 373-382), an omnipresent molecule in any cell (couldn't copy the appropriate picture here, but the reference has a figure with the em/exc spectra).

      The paper of Lee et al. 2015 shows that the excitation spectrum of NADH is ending close to 400 nm. This means that NADH is not or only very weakly excitable at 405nm, what we used as the excitation laser line. 

      (2) The authors suggest that "no organic matter was detected between the needles of the porcelain structures (Figures 3E; 3E; S4C, and S5A)". Such a suggestion, which is highly unusual considering that biogenic minerals almost by definition contain various organic components, was made based only on FE-SEM observation. The authors should either provide clearcut evidence of the lack of organic matter (unlikely) or may suggest that intense calcium carbonate precipitation within organic matrix gel ultimately results in a decrease of the amount of the organic phase (but not its complete elimination), alike the pure calcium carbonate crystals are separated from the remaining liquid with impurities ("mother liquor"). On the other hand, if (249-250) "organic matrix involved in the biomineralization of foraminiferal shells may contain collagen-like networks", such "laminar" organization of the organic matrix may partly explain the arrangement of carbonate fibers parallel to the surface as observed in Fig. 3E1.

      We agree with the reviewer that biogenic minerals should by definition contain some organic components. We just wrote that "no organic matter was detected between the needles of the porcelain structures” that means that we did not detect any organic structures based only on our FE-SEM observations. We will rephrase this part of the text to avoid further confusion.

      (3) The author's observations indeed do not show the formation of individual skeletal crystallites within intracellular vesicles, however, do not explain either what is the structure of individual skeletal crystallites and how they are formed. Especially, what are the structures observed in polarized light (and interpreted as calcite crystallites) by De Nooijer et al. 2009? The author's explanation of the process (lines 213-216) is not particularly convincing "we suspect that the OM was removed from the test wall and recycled by the cell itself".

      Thank you for this comment. We will do our best to supplement our explanations. We are aware about the structures observed in polarized light by De Nooijer et al. (2009). However, Goleń et al. (2022, Prostist; + 2 other citations) showed that organic polymers may also exhibit light polarization. Additional experimental studies are needed to separate these types of polarization. We will try to investigate this issue in our future research.

      (4) The following passage (lines 296-304) which deals with the concept of mesocrystals is not supported by the authors' methodology or observations. The authors state that miliolid needles "assembled with calcite nanoparticles, are unique examples of biogenic mesocrystals (see Cölfen and Antonietti, 2005), forming distinct geometric shapes limited by planar crystalline faces" (later in the same passage the authors say that "mesocrystals are common biogenic components in the skeletons of marine organisms" (are they thus unique or are they common)? It is my suggestion to completely eliminate this concept here until various crystallographic details of the miliolid test formation are well documented.

      Our intension was to express that mesocrystals are common biogenic components in the skeletons of marine organisms however such a miliolid needles forming distinct geometric shapes limited by planar crystalline faces are unique.

      Reviewer #1 (Recommendations For The Authors):

      Below, I have summarized my main criticisms.

      (1) The movies S1-S4 do not indicate what is described. The videos show indeed seawater (S1), cell membranes (S2), and autofluorescence and acidic vesicles (S3 and S4). The presence of all these intracellular structures is not surprising: any eukaryotic cell will have those. The authors, however, claim that they participate in the process of calcification, which is simply not shown. One of the main arguments seems the presence of 'carbonate pools', in the caption these are even claimed to be 'Mg-ACC pools', but this is by no means revealed by an excitation of 405nm/ emission between 420 and 490 nm. It would be very convenient if it was possible to visualize ACC by illumination with a blacklight, but there are very many organic molecules that have an autofluorescence excited by ~405 nm. One of the examples is NADH (Lee et al., 2015. Kor J Physiol Pharmac 19(4): 373-382), an omnipresent molecule in any cell (couldn't copy the appropriate picture here, but the reference has a figure with the em/exc spectra).

      The paper of Lee et al. 2015 shows that the excitation spectrum of NADH is ending close to 400 nm. This means that NADH is not or only very weakly excitable at 405nm, what we used as the excitation laser line. 

      The fluorescence by this excitation/ emission couple unlikely indicates the vesicles in which these foraminifera calcify. Therefore, most of the interpretation of the authors on what happens with the calcitic needles is not based on results but remains pure speculation.

      The fluorescence autofluorescence upon excitation at 405 nm (emission 420–480 nm is typical for CaCO3 both for biocalcite and amorphous calcium carbonate, what was proven by laboratory synthesis of amorphous calcium carbonate (Dubicka et al., in preparation).

      (2) The results mention 'granules', which are the supposed Mg-ACC-containing vesicles, but the movies simply don't show any granules. Only fluorescence. Again, the results show a lot of vesicles with autofluorescence, but these are not necessarily related to calcification. Proof could be supplied by showing that the same fluorescent vesicles are 'used up' when the specimens under observation are making a new chamber, but until that is done, the fate of all these vesicles remains uncertain and once more, may not be involved in calcification at all.

      We suspect that vesicles preparing and transporting Mg-ACC are produced way before their docking and deposition into the new wall, because such seawater vesicles were observed between the chamber formation stages (Goleń and Tyszka, 2024, personal communication based on independent experiments on a closely related miliolid taxon). It means that our in vivo experiments most likely represent a long, dynamic stage of vesicles formation via seawater endocytosis, their modification (incl. Mg-ACC formation) before the stage of exocytosis during the new chamber formation. Our crucial arguments supporting the calcification model come from the SEM imaging of the specimens fixed during chamber formation, as well as from the transparency of the new chamber wall during its progressive calcification.

      (3) The Methods are unclear. How long were the foraminifers kept before being placed under the microscope? Were they fed with anything? This is important since the chlorophyll should not be from any food source. I didn't know that this foraminiferal species has photosynthetic symbionts: genera like Quinqueloculina don't. Is there any reference for this? Normally, I wouldn't care that much, but the authors find the presence of (facultative) symbionts important (lines 305-336). I am a bit suspicious about this since the only evidence for the presence of photosynthetic symbionts is because of the autofluorescence. As the authors said, commonly these miliolid species are regarded as symbiont-barren, so additional proof for these symbionts is necessary.

      We agree that additional proof is needed for the presence of photosynthetic symbionts. We rephrased the manuscript accordingly.

      (4) It is also unclear (Methods) at what stage the miliolids were photographed (Figure 3). How did chamber formation proceed, what was the timing of the photographs, etc. These pictures are to me the most interesting finding of this study, but need to be described much better.

      All individuals of living foraminifera were fixed at the overall stage of chamber formation. However, every individual presents a complete set of successive steps (substages) of chamber wall calcification fixed at once. Fig. 3A and B present nearly the most proximal (youngest) part of the new chamber with a thick wall of calcite nanograins within a gel-like organic matrix. Fig. 3C and D present a bit more distal (intermediate) part of the calcified chamber. Fig. 3E shows the most distal part of the new chamber. This part is anchored to the older, underlying solid calcified chamber (not shown in this figure). All these steps are synchronous, however, represent gradual successive stages of calcification. The main text and Figs 4 and 5 explain this phenomenon in details.

      There are many small issues with the text too. These include:

      Line 28/29: in many other groups, calcification is thought to be polyphyletic (e.g. sponges: Chombard et al., 1997. Biol Bull 193: 359-367).

      Corrected

      Line 29/30: there may be even more 'types of shells'. The first author has shown in earlier papers that nodosarids have a unique shell architecture. Spirillinids also seem to have their own way of calcification. It is unclear what is meant here by 'two contrasting models'.

      By now there are known only two models of foraminiferal calcification. Lagenida biocalcification has not been studied.

      Line 33: 'Both groups'? This paper only shows calcification in miliolids.

      However, we refer to previous study.

      Line 42: Perhaps, but there is no data on the pseudopodial network in this manuscript.

      We refer to Angell, 1980 studies

      Line 43: Likely, but that is not what this manuscript is showing.

      Line 42-44: The authors should make a choice and be clear. The point of this paper is that miliolids and rotalids calcify in ways that are actually not as different as they seemed previously. Still, they are said to have different 'chamber formation modes'. If they are calcifying in a similar way (which I think is not necessarily supported by the results), isn't calcification in these groups like variations on the same theme? How does this relate to the independent origins of calcification within these two groups?

      Our intension is to show that Miliolida and Rotaliida utilize less divergent calcification pathways, following the recently discovered biomineralization principles.

      Line 49-51: is this a well-established distinction? If so, please add a reference. If not: what is fundamentally different between B and C? Does only the size of the intracellular vesicle matter?

      Rephrased

      Line 60: please include a reference for the intracellular calcification by coccolithophores.

      Added

      Line 67: this is wrong. It is the alignment of the needles at the surface that makes them all reflect light in the same way and gives the shells a porcelaneous appearance. A close-up of the miliolid's shell surface shows this arrangement. Underneath this layer, the orientation of the needles is more random.

      We referred to Johan Hohenegger papers.

      Line 114: how else?

      Line 114-116: I don't see the relevance here. If seawater is taken up, the vesicle containing this seawater has to have a membrane around it. By definition. The text here ('These vesicles') suggests that Calcein and FM1-43 were combined (which they easily could have), but the methods describe that they are used successively.

      Yes, we used two dyes separately.

      Lines 122-130: I think the interpretation of this autofluorescence signal is wrong. Even if it was true, these lines belong to the Discussion.

      This paragraph has been placed within discussion

      Line 138: What are 'mobile clusters'? I don't see a relation between the location of the symbionts and the other vesicles (Figure 2).

      Line 147-148: How can an SEM image show the absence of organic matter?

      We meant the absence of the gel-like OM visible in the previous stages of the chamber formation

      Line 148: Should be 'Figs. 3E; 3E1; S4C'.

      Corrected

      Lines 143-150: this can be merged with the following paragraph.

      Done

      Lines 151-169: why is there no indication of the time? Figures 3 and 4 link the pictures in time to show the development of the growing chamber wall. However, neither here nor in the methods, is there any recording of the time after the beginning of chamber formation. Now, the images are linked (Figure 4) as if they were taken at regular intervals, but this is not documented.

      Lines 170-184: this should go to the Discussion.

      Done

      Line 193-195: this is likely, but not visible in Figure 1.

      It was visible by optical microscopy and described by Angell, 1980

      Line 199-201: I don't understand this: the fluorescent vesicles were not observed during chamber formation so any link between the SEM and CLSM scans remains pure speculation.

      Line 203-204: needed for what?

      For better documentation of Miliolid ACC-bearing granules

      Line 220: is this shown in any of the images? 

      Angell, 1980

      Line 230: It sounds nice, but I don't think a 'paradigm shift' is appropriate here. However interesting and important foraminiferal biomineralization is, the authors show that the crystals of miliolids are likely formed differently than previously thought. If this is a 'paradigm shift', then most scientific findings are.

      In our opinion this is definitely a shift of paradigm

      Line 231: I don't think anyone suggested miliolids and coccolithophores share 'the same' pathway. They are shown (cocco's) and thought (miliolids) to secrete their calcite intracellularly.

      Changed to similar, intracellular

      Line 258: References should only be to peer-reviewed studies.

      Line 430: Burgers'

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      Please separate clearly the results (observations) from the discussion (interpretations): various interpretational/commentary phrases should be removed from the Results section to Discussion e.g., lines 124-130, 131-135.

      Interpretation have been separated from results as suggested by Reviewer.

      [line 49] " living cells have evolved three major skeleton crystallization pathways". I would rather say "organisms" not "cells" as the coordination of the calcification process in multicellular organisms clearly involves processes that are beyond the individual cell activity.

      Corrected

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Original comment: There is no explanation for how this work could be a breakthrough in simulation gregarious feeding as is stated in the manuscript.

      Reviewer response: I think I understand where the authors are trying to take this next step. If the authors were to follow up on this study with the proposed implementation of inhalant/exhalent velocities profiles (or more preferably velocity/pressure fields), then that study would be a breakthrough in simulating such gregarious feeding. Based on what has been done within the present study, I think the term "breakthrough" is instead overly emphatic. An additional note on this. The authors are correct that incorporating additional models could be used to simulation a population (as has been successfully done for several Ediacaran taxa despite computational limitations), but it's not the only way. The authors 1 might explore using periodic boundary conditions on the external faces of the flow domain. This could require only a single Olivooid model to assess gregarious impacts - see the abundant literature of modeling flow through solar array fields.

      We appreciate the reviewer 1 for the suggestion. Modeling gregarious feeding via periodic boundary conditions is surely a practical way with limited computational resources. Modeling flow through solar array fields can also be an inspiring case. However, to realism the simulation of gregarious feeding behavior on an uneven seabed and with irregular organism spatial distribution, just using periodic boundary conditions may not be sufficient (see Author response image 1 for a simple example). We will go on exploring the way of realizing the simulations of large-scale gregarious feeding.

      Author response image 1.

      An example of modeling gregarious feeding behavior on an uneven seabed.

      Original comment: The claim that olivooid-type feeding was most likely a prerequisite transitional form to jet-propelled swimming needs much more support or needs to be tailored to olivooids. This suggests that such behavior is absent (or must be convergent) before olivooids, which is at odds with the increasing quantities of pelagic life (whose modes of swimming are admittedly unconstrained) documented from Cambrian and Neoproterozoic deposits. Even among just medusozoans, ancestral 1 state reconstruction suggests that they would have been swimming during the Neoproterozoic (Kayal et al., 2018; BMC Evolutionary Biology) with no knowledge of the mechanics due to absent preservation. Author response: Thanks for your suggestions. Yes, we agree with you that the ancestral swimming medusae may appear before the early Cambrian, even at the Neoproterozoic deposits. However, discussions on the affinities of Ediacaran cnidarians are severely limited because of the lack of information concerning their soft anatomy. So, it is hard to detect the mechanics due to absent preservation. Olivooids found from the basal Cambrian Kuanchuanpu Formation can be reasonably considered as cnidarians based on their radial symmetry, external features, and especially the internal anatomies (Bengtson and Yue 1997; Dong et al. 2013; 2016; Han et al. 2013; 2016; Liu et al. 2014; Wang et al. 2017; 2020; 2022). The valid simulation experiment here was based on the soft tissue preserved in olivooids.

      Reviewer response: This response does not sufficiently address my earlier comment. While the authors are correct that individual Ediacaran affinities are an area of active research and that Olivooids can reasonably be considered cnidarians, this doesn't address the actual critique in my comment. Most (not all) Ediacaran soft-bodied fossils are considered to have been benthic, but pelagic cnidarian life is widely acknowledged to at least be present during later White Sea and Nama assemblages (and earlier depending on molecular clock interpretations). The authors have certainly provided support for the mechanics of this type of feeding being co-opted for eventual jet propulsion swimming in Olivooids. They have not provided sufficient justifications within the manuscript for this to be broadened beyond this group.

      Thanks for your sincere commentary. We of course agree with the possibility of the emergence of swimming cnidarians before the lowermost Cambrian Fortunian Stage. See lines 16-129: “Ediacaran fossil assemblages with complex ecosystems consist of exceptionally preserved soft-bodied eukaryotes of enigmatic morphology, which their affinities are mostly unresolved (Tarhan et al., 2018, Integrative and Comparative Biology, 58 (4), 688–702; Evans et al., 2022, PNAS, 11(46), e220747511).” Undoubtedly Olivooids belong to cnidarians charactered by their external and internal biological structures. Limited by the fossil records, we could only speculate on the transition from the benthic to the swimming of ancestral cnidarians via the valid fossil preservation, e.g. olivooids. The transition may require processes such as increasing body size, thickening the mesoglea, and degenerating the periderm, etc. And these processes may also evolve independently or comprehensively. Moreover, the ecological behaviors of the ancestral cnidarians may evolve independently at different stages from Ediacaran to Cambrian. We therefore could not provide more sufficient justifications beyond olivooids.

      Original comment: L446: two layers of hexahedral elements is a very low number for meshing boundary layer flow

      Reviewer response: As the authors point out in the main text, these organisms are small (millimeters in scale) and certainly lived within the boundary layer range of the ocean. While the boundary layer is not the main point, it still needs to be accurately resolved as it should certainly affect the flow further towards the far field at this scale. I'm not suggesting the authors need to perfectly resolve the boundary layer or focus on using turbulence models more tailored to boundary layer flows (such as k-w), but the flow field still needs sufficient realism for a boundary bounded flow. The authors really should consider quantitatively assessing the number of hexahedral elements within their mesh refinement study.

      To address this concern, we run another four simulations based on mesh4 within our mesh refinement study to assess the number of hexahedral elements (five layers and eight layers of hexahedral elements with different thickness of boundary layer mesh (controlled by thickness adjustment factor), respectively). the results had been supplemented to Table supplement 2. As shown in the results, the number of layers of hexahedral elements seems does not significant influence the result, but the thickness of boundary layer mesh can influence the maximum flow velocity of the contraction phase. However, the results of all the simulations were generally consistent, as shown in Author response image 2. The description of the results above were added to section “Mesh sensitivity analysis”.

      Author response image 2.

      Results of mesh refinement study of different boundary layer mesh parameters.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The points raised let us critically rethink our approach, our results, and our conclusions. Furthermore, it gave us the chance to elaborate on some critical aspects that were mentioned. With the help of the reviewers, we made some clarifications in the point-by-point responses and implemented them in the manuscript. Furthermore, we modified the figures as suggested:

      - The colors in Figure 1C, D, G and H have been adapted as suggested

      - We added a Figure2-figure supplement 1, which strengthens our conclusion in Figure 2

      - As asked by reviewer #1 (weaknesses #3), we added the data about neutrophil numbers in the different organs (Figure 6-figure supplement 3C).

      Reviewer #1 (Public Review):

      Summary:

      - Extracellular ATP represents a danger-associated molecular pattern associated to tissue damage and can act also in an autocrine fashion in macrophages to promote proinflammatory responses, as observed in a previous paper by the authors in abdominal sepsis. The present study addresses an important aspect possibly conditioning the outcome of sepsis that is the release of ATP by bacteria. The authors show that sepsis-associated bacteria do in fact release ATP in a growth dependent and strain-specific manner. However, whether this bacterial derived ATP play a role in the pathogenesis of abdominal sepsis has not been determined. To address this question, a number of mutant strains of E. coli has been used first to correlate bacterial ATP release with growth and then, with outer membrane integrity and bacterial death. By using E. coli transformants expressing the ATP-degrading enzyme apyrase in the periplasmic space, the paper nicely shows that abdominal sepsis by these transformants results in significantly improved survival. This effect was associated with a reduction of peritoneal macrophages and CX3CR1+ monocytes, and an increase in neutrophils. To extrapolate the function of bacterial ATP from the systemic response to microorganisms, the authors exploited bacterial OMVs either loaded or not with ATP to investigate the systemic effects devoid of living microorganisms. This approach showed that ATP-loaded OMVs induced degranulation of neutrophils after lysosomal uptake, suggesting that this mechanism could contribute to sepsis severity.

      Strengths:

      - A strong part of the study is the analysis of E. coli mutants to address different aspects of bacterial release of ATP that could be relevant during systemic dissemination of bacteria in the host.

      We want to thank the reviewer for recognizing this important aspect of our experimental approach.

      Weaknesses:

      - As pointed out in the limitations of the study whether ATP-loaded OMVs provide a mechanistic proof of the pathogenetic role of bacteria-derived ATP independently of live microorganisms in sepsis is interesting but not definitively convincing. It could be useful to see whether degranulation of neutrophils is differentially induced by apyrase-expressing vs control E. coli transformants.

      We thank the reviewer for raising several important points. In our study, we assessed local and systemic effects of released bacterial ATP. The consequences of local bacterial ATP release were assessed using an apyrase-expressing E. coli transformant. Locally, bacterial ATP resulted in a decrease in neutrophil numbers and we hypothesize that directly released bacterial ATP either leads to neutrophil death (e.g. via P2X7 receptor (Proietti et al., 2019)) or interferes with the recruitment of neutrophils (e.g. via P2Y receptors (Junger, 2011)).

      The systemic consequences were assessed using ATP-loaded and empty OMV. We have shown that degranulation is induced by OMV-derived bacterial ATP. ATP-containing OMV are engulfed by neutrophils, reach its endolysosomal compartment and might activate purinergic receptors, which then lead to aberrant degranulation. This concept, that needs to be explored in future studies, is fundamentally different from classical purinergic signaling via directly released bacterial ATP into the extracellular space.

      It is possible that neutrophil degranulation is also modulated by directly released bacterial ATP. We agree that this should be assessed in future studies. Also, the role of OMV-derived bacterial ATP should be assessed locally as well as the importance of directly released vs. OMV-mediated bacterial ATP dissected locally. Based on our measurements (Figure 4-figure supplement 1A and Figure 5C), we estimate that the effect of OMV-derived bacterial ATP might be much smaller than the effects of directly released bacterial ATP. Thus, direct ATP release might predominate locally. However, we fully agree that this has to be investigated in a future study to reconcile the different aspects of bacterial ATP signaling. A paragraph will be added to the manuscript, in which we discuss this particular issue.

      - Also, the increase of neutrophils in bacterial ATP-depleted abdominal sepsis, which has better outcomes than "ATP-proficient" sepsis, seems difficult to correlate to the hypothesized tissue damage induced by ATP delivered via non-infectious OMVs.

      We fully acknowledge the mentioned discrepancy. What we propose is that bacterial ATP exhibits different functions that are dependent on the release mechanism (see above). Locally, in the peritoneal cavity, neutrophil numbers are decreased by directly released bacterial ATP. Remotely, ATP is delivered via OMV and impacts on neutrophil function. We agree that, in particular, in the peritoneal cavity, both effects may play a role. However, the impact of directly released bacterial ATP seems to be dominant (see above).

      We propose that neutrophils are decreased locally because of directly released bacterial ATP, which prevents efficient infection control and, therefore, impairs sepsis survival. In addition, these fewer neutrophils might even be dysregulated by the engulfment of bacterial ATP delivered via OMV, which leads to an upregulated and possibly aberrant degranulation process worsening local and remote tissue damage. We agree that in addition to neutrophil numbers, the function of local neutrophils should be assessed with and without the influence of OMV-delivered bacterial ATP. This could be done by RNA sequencing of primary neutrophils from the peritoneal cavity or neutrophil cell lines as well as degranulation assays.

      - Are the neutrophils counts affected by ATP delivered via OMVs?

      This is difficult to show in the peritoneal cavity where we have both, directly released bacterial ATP and OMV-derived bacterial ATP. We assessed such putative difference, however, for the systemic organs and the blood, where we did not find any differences in neutrophil numbers.

      Author response image 1.

      - A comparison of cytokine profiles in the abdominal fluids of E. coli and OMV treated animals could be helpful in defining the different responses induced by OMV-delivered vs bacterial-released ATP. The analyses performed on OMV treated versus E. coli infected mice are not closely related and difficult to combine when trying to draw a hypothesis for bacterial ATP in sepsis.

      We fully agree that there are several open questions that remain to be elucidated, in particular, to differentiate the local role of directly released versus OMV-delivered bacterial ATP. In this study, we laid the foundation for future in vivo research to examine the specific role of bacterial ATP in sepsis. Such future research avenues might be to investigate the local effects of OMV-delivered bacterial ATP, and how neutrophil migration, apoptosis and degranulation are altered. We agree that exploration of the local secretory immune response and cytokine profiles are relevant to understand the different mechanisms of how bacterial ATP alters sepsis. However, such experiments should be ideally performed in systems where the source and the delivery of ATP can be modulated locally.

      - Also it was not clear why lung neutrophils were used for the RNAseq data generation and analysis.

      Thank you for this remark. We have chosen primary lung neutrophils for four reasons:

      (1) Isolation of primary lung neutrophils allowed us to assess an in vivo response that would not have been possible with cell lines.

      (2) The lung and the respiratory system are among the clinically most important organs affected during sepsis resulting in a significant cause of mortality.

      (3) We show in Figure 6C that specifically in the lung, OMV are engulfed by neutrophils, which shows the relevance of the lung also in our study context.

      (4) And finally, lung neutrophils were chosen to examine specifically distant and not local effects.

      Reviewer #2 (Public Review):

      Summary:

      - In their manuscript "Released Bacterial ATP Shapes Local and Systemic Inflammation during Abdominal Sepsis", Daniel Spari et al. explored the dual role of ATP in exacerbating sepsis, revealing that ATP from both host and bacteria significantly impacts immune responses and disease progression.

      Strengths:

      - The study meticulously examines the complex relationship between ATP release and bacterial growth, membrane integrity, and how bacterial ATP potentially dampens inflammatory responses, thereby impairing survival in sepsis models. Additionally, this compelling paper implies a concept that bacterial OMVs act as vehicles for the systemic distribution of ATP, influencing neutrophil activity and exacerbating sepsis severity.

      We thank the reviewer for mentioning these key points and supporting the relevance of our study.

      Weaknesses:

      (1) The researchers extracted and cultivated abdominal fluid on LB agar plates, then randomly picked 25 colonies for analysis. However, they did not conduct 16S rRNA gene amplicon sequencing on the fluid itself. It is worth noting that the bacterial species present may vary depending on the individual patients. It would be beneficial if the authors could specify whether they've verified the existence of unculturable species capable of secreting high levels of Extracellular ATP.

      Most septic complications are caused by a limited spectrum of bacteria, belonging mainly either to the Firmicutes or the Proteobacteria phyla, including E. coli, K. pneumoniae, S. aureus or E. faecalis (Diekema et al., 2019; Mureșan et al., 2018). We validated this well documented existing evidence by randomly assessing 25 colonies. For the planned experiments, it was crucial to work with culturable bacteria; otherwise, ATP measurements, the modulation of ATP generation or loading of OMV would not have been possible. Using such culturable bacteria allowed us to describe mechanisms of ATP release.

      We fully agree that hard-to-culture or unculturable bacteria might contribute significantly to septic complications. This, however, would need to be explored in future studies using extensive culturing methods (Cheng et al., 2022).

      (2) Do mice lacking commensal bacteria show a lack of extracellular ATP following cecal ligation puncture?

      ATP is typically secreted by many cells of the host in active and passive manners in the case of any injury, including cecal ligation and puncture (Burnstock, 2016; Dosch et al., 2018; Eltzschig et al., 2012; Idzko et al., 2014). We hypothesize that bacterial ATP is a potential priming agent at early stages of sepsis, and indeed, at such early time points, a comparison of peritoneal ATP levels between germfree and colonized mice could support our hypothesis. Future studies addressing this question must, however, correct for the different immune responses between germ-free and colonized mice. This is of utmost importance, especially for the cecal ligation and puncture model, since the cecum of germ-free mice is extremely large, making such experiments hard to control.

      (3) The authors isolated various bacteria from abdominal fluid, encompassing both Gram-negative and Gram-positive types. Nevertheless, their emphasis appeared to be primarily on the Gram-negative E. coli. It would be beneficial to ascertain whether the mechanisms of Extracellular ATP release differ between Gram-positive and Gram-negative bacteria. This is particularly relevant given that the Gram-positive bacterium E. faecalis, also isolated from the abdominal fluid, is recognized for its propensity to release substantial amounts of Extracellular ATP.

      We fully agree with this comment. In this paper, we used E. coli as our model organism to determine the principles of sepsis-associated bacterial ATP release and therefore focused on gram-negative bacteria. In addition to the direct, growth-dependent release, we found a relevant impact of OMV-delivered bacterial ATP. For this latter purpose, a gram-negative strain, in which OMV generation has been well described (Schwechheimer & Kuehn, 2015), was chosen. Recently, gram-positive bacteria have been shown to secrete ATP and OMV as well (Briaud & Carroll, 2020; Hironaka et al., 2013; Iwase et al., 2010). Given the fundamental differences in the structure of the cell wall of gram-positive bacteria and the mechanisms of OMV generation and release, future studies are required to assess the relevance of directly released and OMV-delivered ATP in gram-positive bacteria.

      (4) The authors observed changes in the levels of LPM, SPM, and neutrophils in vivo. However, it remains uncertain whether the proliferation or migration of these cells is modulated or inhibited by ATP receptors like P2Y receptors. This aspect requires further investigation to establish a convincing connection.

      We fully agree with this comment. The decrease in LPM and the consequential predomination of SPM have been well described after inflammatory stimuli in the context of the macrophage disappearance reaction (Ghosn et al., 2010). Also, it has been shown that purinergic signaling modulates infiltration of neutrophils and can lead to cell death as a consequence of  P2Y and P2X receptor activation (Junger, 2011; Proietti et al., 2019). In our study, we propose that intracellular purinergic receptors contribute to neutrophil function during sepsis. After introducing the general principles and fundaments of bacterial ATP with our studies, we fully agree that additional experiments need to address downstream purinergic receptor activation. That, however, would go beyond the scope of our study.

      (5) Additionally, is it possible that the observed in vivo changes could be triggered by bacterial components other than Extracellular ATP? In this research field, a comprehensive collection of inhibitors is available, so it is desirable to utilize them to demonstrate clearer results.

      This question is of utmost importance and defined the choice of our model and experimental approach. When we started the project, we used two different E. coli mutants that release low (ompC) and high (eaeH) amounts of ATP. However, the limitation of this approach is that these are different bacteria, which may also differ in the components they secrete or the surface proteins they express. We, therefore, decided against that approach. With the approach we finally used (same bacterium, just with and without ATP), we aimed to minimize the influence of non-ATP bacterial components.

      (6) Have the authors considered the role of host-derived Extracellular ATP in the context of inflammation?

      Yes, the role of host-derived extracellular ATP in inflammation and sepsis is well-established with contradictory results (Csóka et al., 2015; Ledderose et al., 2016). This conflicting data was the rationale to test the relevance of bacterial ATP. We suggest that bacterial ATP is essential in the early phase of sepsis when bacteria invade the sterile compartment and before efficient host response, including the eukaryotic release of ATP, is established.

      (7) The authors mention that Extracellular ATP is rapidly hydrolyzed by ectonucleotases in vivo. Are the changes of immune cells within the peritoneal cavity caused by Extracellular ATP released from bacterial death or by OMVs?

      This is a relevant question that was also asked by reviewer #1, and we answered it in detail above (weaknesses comment #1 and #2). From our ATP measurements (Figure 4-figure supplement 1A and Figure 5C), we conclude that locally, the role of directly released bacterial ATP (extracellular) predominates over OMV-derived bacterial ATP. Furthermore, the mechanisms between directly released and OMV-derived bacterial ATP (within OMV, engulfed and transported to the endolysosomal compartment) are different, and especially extracellular ATP has been described to lead to apoptosis via P2X7 signaling.

      (8) In the manuscript, the sample size (n) for the data consistently remains at 2. I would suggest expanding the sample size to enhance the robustness and rigor of the results.

      Two biological replicates (independent cultures) were only used for the bacteria cultures in Figure 1, Figure 2, and Figure 3, which achieved similar results and the standard deviation remained very small, indicating its robustness. In the in vitro experiments in Figure 5 we used a sample size of 6 (three biological replicates measured in technical duplicates), since we saw bigger deviations in our measurements. For the in vivo experiments, we always used 5 or more animals in at least two independent experiments.

      Reviewer #2 (Recommendations For The Authors):

      (9). Line 37: 11 million sepsis-related deaths were reported "in" 2017.

      The passage has been corrected as suggested.

      (10) By the way, the similar colors used in Figure 1C and G are too chaotic, making it difficult to distinguish.

      We agree, the colors have been adapted.

      Author response image 2.

      (11). All "in vivo" and "in vitro" should be italicized.

      We italicized all of them.

      (12). The title of Figure 4 is confusing: "Impairs sepsis outcome in vivo?" Could you make it more specific?

      We agree, the title has been rephrased:

      “Bacterial ATP reduces neutrophil counts and reduces survival in a mouse model of abdominal sepsis.”

      (13) Line 314-316: The sentence "Potentially, despite the lack of a transporter, ATP may similarly to eukaryotic cells leak (Yegutkin et al., 2006) across the inner membrane into the periplasmic space that lacks the enzymes for ATP generation." sounds odd.

      This passage was reformulated in the manuscript.

      “Despite the lack of a transporter, ATP may leak across the inner membrane into the periplasmic space. Such leakage may be similar to baseline leakage in eukaryotic cells (Yegutkin et al., 2006).”

      (14) The numerical notation in the paper is odd: sometimes it uses a prime symbol as a superscript (such as line 504), and sometimes it does not (such as line 421). Should it be standardized to "3,200" and "150,000"?

      Thank you for this remark. The numbers have been standardized throughout the manuscript.

      (15) Line "0.4 mm EP cuvettes" should be "0.4 cm EP cuvettes"

      The specified passage has been corrected as suggested.

      References

      Briaud, P., & Carroll, R. K. (2020). Extracellular Vesicle Biogenesis and Functions in Gram-Positive Bacteria. Infection and Immunity, 88(12), 10.1128/iai.00433-20. https://doi.org/10.1128/iai.00433-20

      Burnstock, G. (2016). P2X ion channel receptors and inflammation. Purinergic Signalling, 12(1), 59–67. https://doi.org/10.1007/s11302-015-9493-0

      Cheng, A. G., Ho, P.-Y., Aranda-Díaz, A., Jain, S., Yu, F. B., Meng, X., Wang, M., Iakiviak, M., Nagashima, K., Zhao, A., Murugkar, P., Patil, A., Atabakhsh, K., Weakley, A., Yan, J., Brumbaugh, A. R., Higginbottom, S., Dimas, A., Shiver, A. L., … Fischbach, M. A. (2022). Design, construction, and in vivo augmentation of a complex gut microbiome. Cell, 185(19), 3617-3636.e19. https://doi.org/10.1016/j.cell.2022.08.003

      Csóka, B., Németh, Z. H., Törő, G., Idzko, M., Zech, A., Koscsó, B., Spolarics, Z., Antonioli, L., Cseri, K., Erdélyi, K., Pacher, P., & Haskó, G. (2015). Extracellular ATP protects against sepsis through macrophage P2X7 purinergic receptors by enhancing intracellular bacterial killing. The FASEB Journal, 29(9), 3626–3637. https://doi.org/10.1096/fj.15-272450

      Diekema, D. J., Hsueh, P.-R., Mendes, R. E., Pfaller, M. A., Rolston, K. V., Sader, H. S., & Jones, R. N. (2019). The Microbiology of Bloodstream Infection: 20-Year Trends from the SENTRY Antimicrobial Surveillance Program. Antimicrobial Agents and Chemotherapy, 63(7), e00355-19. https://doi.org/10.1128/AAC.00355-19

      Dosch, M., Gerber, J., Jebbawi, F., & Beldi, G. (2018). Mechanisms of ATP Release by Inflammatory Cells. International Journal of Molecular Sciences, 19(4), 1222. https://doi.org/10.3390/ijms19041222

      Eltzschig, H. K., Sitkovsky, M. V., & Robson, S. C. (2012). Purinergic Signaling during Inflammation. New England Journal of Medicine, 367(24), 2322–2333. https://doi.org/10.1056/NEJMra1205750

      Ghosn, E. E. B., Cassado, A. A., Govoni, G. R., Fukuhara, T., Yang, Y., Monack, D. M., Bortoluci, K. R., Almeida, S. R., Herzenberg, L. A., & Herzenberg, L. A. (2010). Two physically, functionally, and developmentally distinct peritoneal macrophage subsets. Proceedings of the National Academy of Sciences, 107(6), 2568–2573. https://doi.org/10.1073/pnas.0915000107

      Hironaka, I., Iwase, T., Sugimoto, S., Okuda, K., Tajima, A., Yanaga, K., & Mizunoe, Y. (2013). Glucose Triggers ATP Secretion from Bacteria in a Growth-Phase-Dependent Manner. Applied and Environmental Microbiology, 79(7), 2328–2335. https://doi.org/10.1128/AEM.03871-12

      Idzko, M., Ferrari, D., & Eltzschig, H. K. (2014). Nucleotide signalling during inflammation. Nature, 509(7500), 310–317. https://doi.org/10.1038/nature13085

      Iwase, T., Shinji, H., Tajima, A., Sato, F., Tamura, T., Iwamoto, T., Yoneda, M., & Mizunoe, Y. (2010). Isolation and Identification of ATP-Secreting Bacteria from Mice and Humans. Journal of Clinical Microbiology, 48(5), 1949–1951. https://doi.org/10.1128/JCM.01941-09

      Junger, W. G. (2011). Immune cell regulation by autocrine purinergic signalling. Nature Reviews Immunology, 11(3), 201–212. https://doi.org/10.1038/nri2938

      Ledderose, C., Bao, Y., Kondo, Y., Fakhari, M., Slubowski, C., Zhang, J., & Junger, W. G. (2016). Purinergic Signaling and the Immune Response in Sepsis: A Review. Clinical Therapeutics, 38(5), 1054–1065. https://doi.org/10.1016/j.clinthera.2016.04.002

      Mureșan, M. G., Balmoș, I. A., Badea, I., & Santini, A. (2018). Abdominal Sepsis: An Update. The Journal of Critical Care Medicine, 4(4), 120–125. https://doi.org/10.2478/jccm-2018-0023

      Proietti, M., Perruzza, L., Scribano, D., Pellegrini, G., D’Antuono, R., Strati, F., Raffaelli, M., Gonzalez, S. F., Thelen, M., Hardt, W.-D., Slack, E., Nicoletti, M., & Grassi, F. (2019). ATP released by intestinal bacteria limits the generation of protective IgA against enteropathogens. Nature Communications, 10(1), Article 1. https://doi.org/10.1038/s41467-018-08156-z

      Schwechheimer, C., & Kuehn, M. J. (2015). Outer-membrane vesicles from Gram-negative bacteria: Biogenesis and functions. Nature Reviews Microbiology, 13(10), 605–619. https://doi.org/10.1038/nrmicro3525

    1. Author response:

      Reviewer #1 (Public Review):

      This study excellently complements the previous one by unveiling the properties of NPRL2 in augmenting the effect of immune checkpoint inhibitors such as pembrolizumab in KRAS mutant lung cancer models.

      The following points should be clarified:

      (1) In KRAS mutant cell lines with LKB1 co-mutations or deletions, such as A549 cells, does treatment with NPRL2 not increase the efficacy of immunotherapy? Is this correct? Similarly, does the delivery of NPRL2 only potentiate the effect of immunotherapy in KRAS mutant cell lines without associated LKB1 mutations?

      NPRL2, when used as a single-agent immunotherapy, induces robust antitumor activity in immunotherapy-resistant (aPD1R) KRAS mutant models, such as A549 tumors (KRASmt/LKB1mt/aPD1R) and LLC2 (KRASmt/aPD1R), where immunotherapy is ineffective regardless of LKB1 co-mutation or deletion status. The antitumor effect of NPRL2 combined with aPD1 immunotherapy was not significantly different from NPRL2 alone in immunotherapy-resistant models but was significantly greater than immunotherapy alone. However, a synergistic antitumor effect was observed with NPRL2 and aPD1 immunotherapy in KRAS wild-type and immunotherapy-moderately-responsive models, such as H1299 (KRASwt/aPD1S).

      (2) Do the authors analyze by western blot if NPRL2 influences or restores STING and LKB1 in the A549 cell line that lacks LKB1 and STING?

      NPRL2 induces antitumor immunity on Kras mutant, aPD1 resistant models regardless of LKB1 co-mutations or deletions, however, it would be interesting to look into the effect of NPRL2 on the STING pathway in this LKB1 deleted A549 cell line.

      (3) Mechanistically, is there any explanation as to why NPRL2 delivery increases the efficacy of immunotherapy? Is there any effect on FUS or MYC?

      NPRL2 is a multifunctional tumor suppressor gene that is downregulated or absent in many cancers. NPRL2 has been shown to induce apoptosis, inhibit cell proliferation, and cause cell cycle arrest in various cancer types. Compelling evidence highlights the critical role of NPRL2 in causing DNA damage and double-strand breaks, which can trigger dendritic cell (DC) activation, antigen presentation, and priming of tumor-specific CD8+ T cells in the tumor microenvironment (TME). Our data indicate that NPRL2 treatment is associated with the induction of DC activation and maturation.

      The cellular mechanism of NPRL2 suggests that NPRL2-mediated antitumor immunity depends on the presence of CD4+ T cells, CD8+ T cells, and macrophages. Interestingly, the expression of FUS1, another tumor suppressor gene, was mostly absent or severely downregulated in most non-small cell lung cancers (NSCLC) and was unaffected by NPRL2 treatment. While MYC expression was not assessed in this study, it remains an area of interest for future research.

      (4) Is there any way to carry out a clinical study of systematically delivering NPRL2 in KRAS lung cancer patients?

      In this preclinical study, a clinical-grade DOTAP-NPRL2 formulation was prepared, utilizing NPRL2 encapsulated within nanovesicles for delivery. Based on the promising preclinical data, a phase I clinical trial will be initiated to evaluate the safety and efficacy of this formulation.

      Reviewer #2 (Public Review):

      Summary:

      NPRL2 gene therapy induces effective antitumor immunity in KRAS/STK11 mutant anti-PD1 resistant metastatic non-small cell lung cancer (NSCLC) in a humanized mouse model by Meraz et al investigated the antitumor immune responses to NPRL2 gene therapy in aPD1R / KRAS/STK11mt NSCLC in a humanized mouse model, and found that NPRL2 gene therapy induces antitumor activity on KRAS/STK11mt/aPD1R tumors through DC-mediated antigen presentation and cytotoxic immune cell activation.

      Strengths:

      The novelty of the study.

      Weaknesses:

      (1) The inconsistent effect of NPRL2 combined with pembrolizumab. Figure 2I-K, showed a similar tumor intensity in the NPRL2 group and combination group. However, NPRL2 combined with pembrolizumab was synergistic in the KRASwt/aPD1S H1299 tumors in Figure 4.

      NPRL2, as a single agent immunogen therapy, induces robust antitumor activity on both immunotherapy-resistant (aPD1R) KRAS mutant models, such as A549 tumors (KRASmt/LKB1mt/aPD1R) and LLC2 (KRASmt/aPD1R) and immunotherapy sensitive model such as H1299 (KRASwt/aPD1S) where immunotherapy was ineffective or limitedly effective. A synergistic antitumor effect of NPRL2 and Pembrolizumab combination was found only in immunotherapy moderately responsive models, not in immunotherapy resistant models where PD-1/PD-L1 signaling is impaired shown in Figure 1A.

      (2) The authors stated that NPRL2 combined with pembrolizumab was not synergistic in the KRAS/STK11mt/aPD1R tumors but was synergistic in the KRASwt/aPD1S H1299 tumors. How did the synergistic effect defined in the study, more details need to be provided here.

      Our biostatistician used generalized linear regression models to study the tumor growth over time. Two-way ANOVA with the interaction of treatment group and time point was performed to compare the difference of tumor intensity changes from baseline between each pair of the treatment groups at each time point. The nonparametric Mann-Whitney U test was applied to compare significance in different treatment groups. Differences of P < 0.05, P < 0.01, and P < 0.001 were considered statistically significant. When the combination antitumor effect of NPRL2 and pembrolizumab was found to be statistically significant compared to both single-agent effects synergy was confirmed using the method of Huang et al.

      Huang L, Wang J, Fang B, Meric-Bernstam F, Roth JA, Ha MJ. CombPDX: a unified statistical framework for evaluating drug synergism in patient-derived xenografts. Sci Rep 12(1):12984, 7/2022. e-Pub 7/2022. PMCID: PMC9338066.

      (3) Nearly all of the work was performed pre-clinically. Validation in the clinical setting would provide more strong evidence for the conclusion.

      In this preclinical study, a clinical-grade DOTAP-NPRL2 formulation was prepared, utilizing NPRL2 encapsulated within nanovesicles for delivery. Based on the promising preclinical data, a phase I clinical trial will be initiated to evaluate the safety and efficacy of this formulation.

      (4) Figure 5 and Figure 6 have the same legend. These 2 figures could be merged as a new one.

      Agreed.

      (5) Figure 5B & C, n=9 in the Figure 5B. However, the detail number in Figure 5C was less than 9.

      At least n=7-9 mice/group are shown in the figure 5C. We will revise accordingly.

      Reviewer #3 (Public Review):

      Summary:

      NPRL2/TUSC4 is a tumor suppressor gene whose expression is reduced in many cancers including NSCLC. This study presents a novel finding on NPRL2 gene therapy, which induces antitumor activity on aPD1-resistant tumors. Since KRAS/STK11 mutant tumors were reported to be less benefited from ICIs, this study has potential clinical application value.

      Strengths:

      This work uncovers the advantage of NPRL2 gene therapy by using humanized models and multiple cell lines. Moreover, via immune cell depletion studies, the mechanism of NPRL2 gene therapy has focused on dendritic cells and CD8+T cells.

      Weaknesses:

      A major concern would be the lack of systematic, and logical rigor. This work did not present a link between apoptosis and antigen presenting induced by NPRL2 restoration. There is no evidence proving that the PI3K/AKT/mTOR signaling pathway is related to antigen presenting, which is the major reason of NPRL2 induced antitumor response. Therefore, the two parts may not support each other logically.

      Thank you for your review and comments. We agree that future studies are necessary to establish a direct link between apoptosis and antigen presentation induced by NPRL2 restoration, as well as NPRL2-mediated downregulation of PI3K/AKT/mTOR signaling and its direct effect on antigen presentation. Although NPRL2 restoration directly induced apoptosis in several cell lines shown in Figure 1C and Figure 8Q and significantly increased the number of antigen-presenting DC cells in the tumor microenvironment upon NPRL2 treatment or NPRL2 restoration. Similarly, NPRL2 restoration downregulated the PI3K/AKT/mTOR pathway, which was associated with increased antitumor immunity.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Gating of Kv10 channels is unique because it involves coupling between non-domain swapped voltage sensing domains, a domain-swapped cytoplasmic ring assembly formed by the N- and C-termini, and the pore domain. Recent structural data suggests that activation of the voltage sensing domain relieves a steric hindrance to pore opening, but the contribution of the cytoplasmic domain to gating is still not well understood. This aspect is of particular importance because proteins like calmodulin interact with the cytoplasmic domain to regulate channel activity. The effects of calmodulin (CaM) in WT and mutant channels with disrupted cytoplasmic gating ring assemblies are contradictory, resulting in inhibition or activation, respectively. The underlying mechanism for these discrepancies is not understood. In the present manuscript, Reham Abdelaziz and collaborators use electrophysiology, biochemistry and mathematical modeling to describe how mutations and deletions that disrupt inter-subunit interactions at the cytoplasmic gating ring assembly affect Kv10.1 channel gating and modulation by CaM. In the revised manuscript, additional information is provided to allow readers to identify within the Kv10.1 channel structure the location of E600R, one of the key channel mutants analyzed in this study. However, the mechanistic role of the cytoplasmic domains that this study focuses on, as well as the location of the ΔPASCap deletion and other perturbations investigated in the study remain difficult to visualize without additional graphical information. This can make it challenging for readers to connect the findings presented in the study with a structural mechanism of channel function.

      The authors focused mainly on two structural perturbations that disrupt interactions within the cytoplasmic domain, the E600R mutant and the ΔPASCap deletion. By expressing mutants in oocytes and recording currents using Two Electrode Voltage-Clamp (TEV), it is found that both ΔPASCap and E600R mutants have biphasic conductance-voltage (G-V) relations and exhibit activation and deactivation kinetics with multiple voltage-dependent components. Importantly, the mutant-specific component in the G-V relations is observed at negative voltages where WT channels remain closed. The authors argue that the biphasic behavior in the G-V relations is unlikely to result from two different populations of channels in the oocytes, because they found that the relative amplitude between the two components in the G-V relations was highly reproducible across individual oocytes that otherwise tend to show high variability in expression levels. Instead, the G-V relations for all mutant channels could be well described by an equation that considers two open states O1 and O2, and a transition between them; O1 appeared to be unaffected by any of the structural manipulations tested (i.e. E600R, ΔPASCap, and other deletions) whereas the parameters for O2 and the transition between the two open states were different between constructs. The O1 state is not observed in WT channels and is hypothesized to be associated with voltage sensor activation. O2 represents the open state that is normally observed in WT channels and is speculated to be associated with conformational changes within the cytoplasmic gating ring that follow voltage sensor activation, which could explain why the mutations and deletions disrupting cytoplasmic interactions affect primarily O2. 

      Severing the covalent link between the voltage sensor and pore reduced O1 occupancy in one of the deletion constructs. Although this observation is consistent with the hypothesis that voltage-sensor activation drives entry into O1, this result is not conclusive. Structural as well as functional data has established that the coupling of the voltage sensor and pore does not entirely rely on the S4-S5 covalent linker between the sensor and the pore, and thus the severed construct could still retain coupling through other mechanisms, which is consistent with the prominent voltage dependence that is observed. If both states O1 and O2 require voltage sensor activation, it is unclear why the severed construct would affect state O1 primarily, as suggested in the manuscript, as opposed to decreasing occupancy of both open states. In line with this argument, the presence of Mg2+ in the extracellular solution affected both O1 and O2. This finding suggests that entry into both O1 and O2 requires voltage-sensor activation because Mg2+ ions are known to stabilize the voltage sensor in its most deactivated conformations. 

      We agree with the reviewer that access to both states requires a conformational change in the voltage sensor. This was stated in our revised article: “In contrast, to enter O2, all subunits must complete both voltage sensor transitions and the collective gating ring transition.” We interpret the two gating steps as sequential; the effective rotation of the intracellular ring would happen only once the sensor is in its fully activated position.

      We also agree that the S4-S5 segment cannot be the only interaction mechanism, as we demonstrated in our earlier work (Lörinczi et al., 2015; Tomczak et al., 2017).  

      Activation towards and closure from O1 is slow, whereas channels close rapidly from O2. A rapid alternating pulse protocol was used to take advantage of the difference in activation and deactivation kinetics between the two open components in the mutants and thus drive an increasing number of channels towards state O1. Currents activated by the alternating protocol reached larger amplitudes than those elicited by a long depolarization to the same voltage. This finding is interpreted as an indication that O1 has a larger macroscopic conductance than O2. In the revised manuscript, the authors performed single-channel recordings to determine why O1 and O2 have different macroscopic conductance. The results show that at voltages where the state O1 predominates, channels exhibited longer open times and overall higher open probability, whereas at more depolarized voltages where occupancy of O2 increases, channels exhibited more flickery gating behavior and decreased open probability. These results are informative but not conclusive because additional details about how experiments were conducted, and group data analysis are missing. Importantly, results showing inhibition of single ΔPASCap channels by a Kv10-specific inhibitor are mentioned but not shown or quantitated - these data are essential to establish that the new O1 conductance indeed represents Kv10 channel activity.

      We observed the activity of a channel compatible with Kv10.1 ΔPAS-Cap (long openings at low-moderate potentials, very short flickery activity at strong depolarizations) in 12 patches from oocytes obtained from different frog operations over a period of two and a half months once the experimental conditions could be established. As stated in the text, we did not proceed to generate amplitude histograms because we could not resolve clear single-channel events at strong depolarizations. Astemizole abolished the activity and (remarkably) strongly reduced the noise in traces at strong depolarizations, which we interpret as partially caused by flicker openings.

      Author response image 1.

      We include two example recordings of Astemizole application (100µM) on two different patches. Both recordings are performed at -60 mV (to decrease the likelihood that the channel visits O2) with 100 mM internal and 60 mM external K+. In both cases, the traces in Astemizole are presented in red.

      It is shown that conditioning pulses to very negative voltages result in mutant channel currents that are larger and activate more slowly than those elicited at the same voltage but starting from less negative conditioning pulses. In voltage-activated curves, O1 occupancy is shown to be favored by increasingly negative conditioning voltages. This is interpreted as indicating that O1 is primarily accessed from deeply closed states in which voltage sensors are in their most deactivated position. Consistently, a mutation that destabilizes these deactivated states is shown to largely suppress the first component in voltage-activation curves for both ΔPASCap and E600R channels.

      The authors then address the role of the hidden O1 state in channel regulation by calmodulation. Stimulating calcium entry into oocytes with ionomycin and thapsigarging, assumed to enhance CaM-dependent modulation, resulted in preferential potentiation of the first component in ΔPASCap and E600R channels. This potentiation was attenuated by including an additional mutation that disfavors deeply closed states. Together, these results are interpreted as an indication that calcium-CaM preferentially stabilizes deeply closed states from which O1 can be readily accessed in mutant channels, thus favoring current activation. In WT channels lacking a conducting O1 state, CaM stabilizes deeply closed states and is therefore inhibitory. It is found that the potentiation of ΔPASCap and E600R by CaM is more strongly attenuated by mutations in the channel that are assumed to disrupt interaction with the C-terminal lobe of CaM than mutations assumed to affect interaction with the N-terminal lobe. These results are intriguing but difficult to interpret in mechanistic terms. The strong effect that calcium-CaM had on the occupancy of the O1 state in the mutants raises the possibility that O1 can be only observed in channels that are constitutively associated with CaM. To address this, a biochemical pull-down assay was carried out to establish that only a small fraction of channels are associated with CaM under baseline conditions. These CaM experiments are potentially very interesting and could have wide physiological relevance. However, the approach utilized to activate CaM is indirect and could result in additional nonspecific effects on the oocytes that could affect the results.

      Finally, a mathematical model is proposed consisting of two layers involving two activation steps for the voltage sensor, and one conformational change in the cytoplasmic gating ring - completion of both sets of conformational changes is required to access state O2, but accessing state O1 only requires completion of the first voltage-sensor activation step in the four subunits. The model qualitatively reproduces most major findings on the mutants. Although the model used is highly symmetric and appears simple, the mathematical form used for the rate constants in the model adds a layer of complexity to the model that makes mechanistic interpretations difficult. In addition, many transitions that from a mechanistic standpoint should not depend on voltage were assigned a voltage dependence in the model. These limitations diminish the overall usefulness of the model which is prominently presented in the manuscript. The most important mechanistic assumptions in the model are not addressed experimentally, such as the proposition that entry into O1 depends on the opening of the transmembrane pore gate, whereas entry into O2 involves gating ring transitions - it is unclear why O2 would require further gating ring transitions to conduct ions given that the gating ring can already support permeation by O1 without any additional conformational changes.

      In essence, we agree with the reviewer; we already have addressed these points in our revised article:

      Regarding the voltage dependence we write “the κ/λ transition could reasonably be expected to be voltage independent because we related it to ring reconfiguration, a process that should occur as a consequence of a prior VSD transition. We have made some attempts to treat this transition as voltage independent but state-specific with upper-layer bias for states on the right and lower-layer bias for states on the left. This is in principle possible, as can already be gleaned from the similar voltage ranges of the left-right transition (α/β) and the κL/λ transition. However, this approach leads to a much larger number of free, less well constrained kinetic parameters and drastically complicated the parameter search. ” As you can see, we also formulated a strategy to free the model of the potentially spurious voltage dependence and (in bold here) explained why we did not follow this route in this study. 

      Regarding the need for gating ring transitions after O1, we wrote, “Thus, the underlying gating events can be separated into two steps: The first gating step involves only the voltage sensor without engaging the ring and leads to a pre-open state, which is non-conducting in the WT but conducting in our mutants. The second gating event operates at higher depolarizations, involves a change in the ring, and leads to an open state both in WT and in the mutants. ” 

      We interpret your statements such that you expect the conducting state to remain available once O1 is reached. However, the experimental evidence speaks against that the pore availability remains regardless of the further gating steps beyond O1. The description of model construction is informative here: “... we could exclude many possible [sites at which O1 connects to closed states] because the attachment site must be sufficiently far away from the conventional open state [O2]. Otherwise, the transition from "O1 preferred" to "O2 preferred" via a few closed intermediate states is very gradual and never produces the biphasic GV curves [that we observed]. ” 

      In other words, voltage-dependent gating steps beyond the state that offers access to O1 appear to close the pore, after it was open. That might occur because only then (for states in which at least one voltage sensor exceeded the intermediate position) the ring is fixed in a particular state until all sensors completed activation. In the WT, closing the pore in deactivated states might rely on an interaction that is absent in the mutant because, at least in HERG: “the interaction between the PAS domain and the C-terminus is more stable in closed than in open KV11.1 (HERG) channels, and a single chain antibody binding to the interface between PAS domain and CNBHD can access its epitope in open but not in closed channels, strongly supporting a change in conformation of the ring during gating ”

      Reviewer #3 (Public Review):

      In the present manuscript, Abdelaziz and colleagues interrogate the gating mechanisms of Kv10.1, an important voltage-gated K+ channel in cell cycle and cancer physiology. At the molecular level, Kv10.1 is regulated by voltage and Ca-CaM. Structures solved using CryoEM for Kv10.1 as well as other members of the KCNH family (Kv11 and Kv12) show channels that do not contain a structured S4-S5 linker imposing therefore a non-domain swapped architecture in the transmembrane region. However, the cytoplasmatic N- and C- terminal domains interact in a domain swapped manner forming a gating ring. The N-terminal domain (PAS domain) of one subunit is located close to the intracellular side of the voltage sensor domain and interacts with the C-terminal domain (CNBHD domain) of the neighbor subunit. Mutations in the intracellular domains has a profound effect in the channel gating. The complex network of interactions between the voltage-sensor and the intracellular domains makes the PAS domain a particularly interesting domain of the channel to study as responsible for the coupling between the voltage sensor domains and the intracellular gating ring.

      The coupling between the voltage-sensor domain and the gating ring is not fully understood and the authors aim to shed light into the details of this mechanism. In order to do that, they use well established techniques such as site-directed mutagenesis, electrophysiology, biochemistry and mathematical modeling. In the present work, the authors propose a two open state model that arises from functional experiments after introducing a deletion on the PAS domain (ΔPAS Cap) or a point mutation (E600R) in the CNBHD domain. The authors measure a bi-phasic G-V curve with these mutations and assign each phase as two different open states, one of them not visible on the WT and only unveiled after introducing the mutations.

      The hypothesis proposed by the authors could change the current paradigm in the current understanding for Kv10.1 and it is quite extraordinary; therefore, it requires extraordinary evidence to support it.

      STRENGTHS: The authors use adequate techniques such as electrophysiology and sitedirected mutagenesis to address the gating changes introduced by the molecular manipulations. They also use appropriate mathematical modeling to build a Markov model and identify the mechanism behind the gating changes.

      WEAKNESSES: The results presented by the authors do not fully support their conclusions since they could have alternative explanations. The authors base their primary hypothesis on the bi-phasic behavior of a calculated G-V curve that do not match the tail behavior, the experimental conditions used in the present manuscript introduce uncertainties, weakening their conclusions and complicating the interpretation of the results. Therefore, their experimental conditions need to be revisited. 

      We respectfully disagree. We think that your suggestions for alternative explanations are addressed in the current version of the article. We will rebut them once more below, but we feel the need to point out that our arguments are already laid out in the revised article.

      I have some concerns related to the following points:

      (1) Biphasic gating behavior

      The authors use the TEVC technique in oocytes extracted surgically from Xenopus Leavis frogs. The method is well established and is adequate to address ion channel behavior. The experiments are performed in chloride-based solutions which present a handicap when measuring outward rectifying currents at very depolarizing potentials due to the presence of calcium activated chloride channel expressed endogenously in the oocytes; these channels will open and rectify chloride intracellularly adding to the outward rectifying traces during the test pulse. The authors calculate their G-V curves from the test pulse steady-state current instead of using the tail currents. The conductance measurements are normally taken from the 'tail current' because tails are measured at a fix voltage hence maintaining the driving force constant. 

      We respectfully disagree. In contrast to other channels, like HERG, a common practice for Kv10 is not to use tail currents. It is long known that in this channel, tail currents and test-pulse steady-state currents can appear to be at odds because the channels deactivate extremely rapidly, at the border of temporal resolution of the measurements and with intricate waveforms. This complicates the estimation of the instantaneous tail current. Therefore, the outward current is commonly used to estimate conductance (Terlau et al., 1996; Schönherr et al., 1999; Schönherr et al., 2002; Whicher and MacKinnon, 2019), while the latter authors also use the extreme of the tail for some mutants.

      Due to their activation at very negative voltage, the reversal potential in our mutants can be measured directly; we are, therefore, more confident with this approach. Nevertheless, we have determined the initial tail current in some experiments. The behavior of these is very similar to the average that we present in Figure 1. The biphasic behavior is unequivocally present.

      Author response image 2.

      Calculating the conductance from the traces should not be a problem, however, in the present manuscript, the traces and the tail currents do not agree. 

      The referee’s observation is perfectly in line with the long-standing experience of several labs working with KV10: tail current amplitudes in KV10 appear to be out of proportion for the WT open state (O2). Importantly, this is due to the rapid closure, which is not present in O1. As a consequence, the initial amplitude of tail currents from O1 are easier to estimate correctly, and they are much more obvious in the graphs. Taken together, these differences between O1 and O2 explain the misconception the reviewer describes next.

      The tail traces shown in Fig1E do not show an increasing current amplitude in the voltage range from +50mV to +120mV, they seem to have reached a 'saturation state', suggesting that the traces from the test pulse contain an inward chloride current contamination. 

      As stated in the text and indicated in Author response image 3, the tail currents In Figure 1E increase in amplitude between +50 and +120 mV, as can be seen in the examples below from different experiments (+50 is presented in black, +120 in red). As stated above, the increase is not as evident as in traces from other mutants because the predominance of O2 also implies a much faster deactivation.

      Author response image 3. 

      We are aware that Ca2+-activated Cl- currents can represent a problem when interpreting electrophysiological data in oocytes. In fact, we show in Supplement 1 to Figure 8 that this can be the case during the Ca2+-CaM experiments, where the increase in Ca2+ would certainly augment Cl- contribution to the outward current. This is why we performed these experiments in Cl--free solutions. As we show in Figure 8, the biphasic behavior was also present in those experiments. 

      Importantly, Cl- free bath solutions would not correct contamination during the tail, since this would correspond to Cl- exiting the oocyte. Yet, if there would be contamination of the outward currents by Cl-, one would expect it to increase with larger depolarizations as the typical Ca2+activated Cl- current in oocytes does. As the reviewer states, this does not seem to be the case.

      In addition, this second component identified by the authors as a second open state appears after +50mV and seems to never saturate. The normalization to the maximum current level during the test pulse, exaggerates this second component on the calculated G-V curve. 

      We agree that this second component continues to increase; the reviewer brought this up in the first review, and we have already addressed this in our reply and in the discussion of the revised version: “This flicker block might also offer an explanation for a feature of the mutant channels, that is not explained in the current model version: the continued increase in current amplitude, hundreds of milliseconds into a strong depolarization (Supp. 4 to Fig. 9). If the relative stability of O2 and C2 continued to change throughout depolarization, such a current creep-up could be reproduced. However, this would require either the introduction of further layers of On ↔Cn states, or a non-Markovian modification of the model’s time evolution.” With non-Markovian, we mean a Langevin-type diffusive process. 

      It's worth noticing that the ΔPASCap mutant experiments on Fig 5 in Mes based solutions do not show that second component on the G-V.

      For the readers of this conversation, we would like to clarify that the reviewer likely refers to experiments shown in Fig. 5 of the initial submission but shown in Fig. 6 of the revised version (“Hyperpolarization promotes access to a large conductance, slowly activating open state.” Fig. 5 deals with single channels). We agree that these data look different, but this is because the voltage protocols are completely different (compare Fig. 6A (fixed test pulse, varied prepulse) and Fig. 2A (varied test pulse, fixed pre-pulse). Therefore, no biphasic behavior is expected. 

      Because these results are the foundation for their two open state hypotheses, I will strongly suggest the authors to repeat all their Chloride-based experiments in Mes-based solutions to eliminate the undesired chloride contribution to the mutants current and clarify the contribution of the mutations to the Kv10.1 gating.

      In summary, we respectfully disagree with all concerns raised in point (1). Our detailed arguments rebutting them are given above, but there is a more high-level concern about this entire exchange: the referee casts doubt on observations that are not new. Several labs have reported for a group of mutant KCNH channels: non-monotonic voltage dependence of activation (see, e.g., Fig. 6D in Zhao et al., 2017), multi-phasic tail currents (see e.g. Fig. 4A in Whicher and MacKinnon, 2019, in CHO cells where Cl- contamination is not a concern), and activation by high [Ca2+]i (Lörinczi et al., 2016). Our study replicates those observations and hypothesizes that the existence of an additional conducting state can alone explain all previously unexplained observations. We highlight the potency of this hypothesis with a Markov model that qualitatively reproduces all phenomena. We not only factually disagree with the individual points raised, but we also think that they don't touch on the core of our contribution

      (2) Two step gating mechanism.

      The authors interpret the results obtained with the ΔPASCap and the E600R as two step gating mechanisms containing two open states (O1 and O2) and assign them to the voltage sensor movement and gating ring rotation respectively. It is not clear, however how the authors assign the two open states.

      The results show how the first component is conserved amongst mutations; however, the second one is not. The authors attribute the second component, hence the second open state to the movement of the gating ring. This scenario seems unlikely since there is a clear voltagedependence of the second component that will suggest an implication of a voltage-sensing current.

      We do not suggest that the gating ring motion is not voltage dependent. We would like to point out that voltage dependence can be conveyed by voltage sensor coupling to the ring; this is the widely accepted theory of how the ring can be involved. Should the reviewer mean it in a narrow sense, that the model should be constructed such that all voltage-dependent steps occur before and independently of ring reconfiguration and that only then an additional step that reflects the (voltage-independent) reconfiguration solely, we would like to point the reviewer to the article, where we write: “the κ/λ transition could reasonably be expected to be voltage independent because we related it to ring reconfiguration, a process that should occur as a consequence of a prior VSD transition. We have made some attempts to treat this transition as voltage independent but state-specific with upper-layer bias for states on the right and lower-layer bias for states on the left. This is in principle possible, as can already be gleaned from the similar voltage ranges of the left-right transition (α/β) and the κL/λ transition. However, this approach leads to a much larger number of free, less well constrained kinetic parameters and drastically complicated the parameter search. ” As you can see, we also formulated a strategy to free the model from the potentially spurious voltage dependence and (in bold here) explained why we did not follow this route in this study. 

      The split channel experiment is interesting but needs more explanation. I assume the authors expressed the 2 parts of the split channel (1-341 and 342-end), however Tomczak et al showed in 2017 how the split presents a constitutively activated function with inward currents that are not visible here, this point needs clarification.

      As stated in the panel heading, the figure legend, and the main text, we did not use 1-341 and 342-end as done in Tomczak et al. Instead, “we compared the behavior of ∆2-10 and ∆210.L341Split,”. Evidently, the additional deletion (2-10) causes a shift in activation that explains the difference you point out. However, as we do not compare L341Split and ∆210.L341Split but ∆2-10 and ∆2-10.L341Split, our conclusion remains that “As predicted, compared to ∆2-10, ∆2-10.L341Split showed a significant reduction in the first component of the biphasic GV (Fig. 2C, D).” Remarkably, the behavior of the ∆3-9 L341Split described in Whicher and MacKinnon, 2019 (Figure 5) matches that of our ∆2-10 L341Split, which we think reinforces our case.

      Moreover, the authors assume that the mutations introduced uncover a new open state, however the traces presented for the mutations suggest that other explanations are possible. Other gating mechanisms like inactivation from the closed state, can be introduced by the mutations. The traces presented for ΔPASCap but specially E600R present clear 'hooked tails', a direct indicator of a populations of inactive channels during the test pulse that recover from inactivation upon repolarization (Tristani-Firouzi M, Sanguinetti MC. J Physiol. 1998). 

      There is a possibility that we are debating nomenclature here. In response to the suggestion that all our observations could be explained by inactivation, we attempted a disambiguation of terms in the reply and the article. As the argument is brought up again without reference to our clarification attempts, we will try to be more explicit here:

      If, starting from deeply deactivated states, an open state is reached first, and then, following further activation steps, closed states are reached, this might be termed “inactivation”. In such a reading, our model features many inactivated states. The shortest version of such a model is C-O-I. It is for instance used by Raman and Bean (2001; DOI: 10.1016/S00063495(01)76052-3) to explain NaV gating in Purkinje neurons. If “inactivation” is meant in the sense that a gating transition exists, which is orthogonal to an activation/deactivation axis, and that after this orthogonal transition, an open state cannot be reached anymore, then all of the upper floor in our model is inactivated with respect to the open state O1. Finally, the state C2 is an inactivated state to O2. In this view, “inactivation” explains the observed phenomena. 

      However, we must disagree if the referee means that a parsimonious explanation exists in which a single conducting state is the only source for all observed currents.   

      There is a high-level reason: we found a single assumption that explains three different phenomena, while the inactivation hypothesis with one conducting state cannot explain one of them (the increase of the first component under raised CaM). But there is also a low-level reason: the tails in Tristani-Firouzi and Sanguinetti 1998 are fundamentally different from what we report herein in that they lack a third component. Thus, those tails are consistent with recovery from inactivation through a single open state, while a three-component tail is not. In the framework of a Markov model, the time constants of transitions from and to a given state (say O2), cannot change unless the voltage changes. During the tail current, the voltage does not change, yet we observe: 

      i) a rapid decrease with a time constant of at most a few milliseconds (Fig 9 S2, 1-> 2),  ii) a slow increase in current, peaking after approximately 25 milliseconds and iii) a relaxation to zero current with a time constant of >50 ms. 

      According to the reviewer’s suggestion, these processes on three timescales should all be explained by depopulating and repopulating the same open state while all rates are constant. There might well be a complicated multi-level state diagram with a single open state with different variants, like (open and open inactivated) that could produce triphasic tails with these properties if the system had not reached a steady state distribution at the end of the test pulse. It cannot, however, achieve it from an equilibrated system, and certainly, it cannot at the same time produce “biphasic activation” and “activation by CaM”. 

      The results presented by the authors can be alternatively explained with a change in the equilibrium between the close to inactivated/recovery from inactivation to the open state. 

      Again, we disagree. The model construction explains in detail that the transition from the first to the second phase is not gradual. Shifting equilibria cannot reproduce this. We have extensively tested that idea and can exclude this possibility.

      Finally, the authors state that they do not detect "cumulative inactivation after repeated depolarization" but that is considering inactivation only from the open state and ignoring the possibility of the existence of close state inactivation or, that like in hERG, that the channel inactivates faster that what it activates (Smith PL, Yellen G. J Gen Physiol. 2002). 

      We respectfully disagree. We explicitly model an open state that inactivates faster (O2->C2) than it activates. Once more, this is stated in the revised article, which we point to for details. Again, this alternative mechanism does not have the potential to explain all three effects. As discussed above about the chloride contamination concerns, this inactivation hypothesis was mentioned in the first review round and, therefore, addressed in our reply and the revised article. We also explained that “inactivation” has no specific meaning in Markov models. In the absence of O1, all transitions towards the lower layer are effectively “inactivation from closed states”, because they make access to the only remaining open state less likely”. But this is semantics. What is relevant is that no network of states around a single open state can reproduce the three effets in a more parsimonious way than the assumption of the second open state does.

      (3) Single channel conductance.

      The single channels experiments are a great way to assess the different conductance of single channel openings, unfortunately the authors cannot measure accurately different conductances for the two proposed open states. The Markov Model built by the authors, disagrees with their interpretation of the experimental results assigning the exact same conductance to the two modeled open states. To interpret the mutant data, it is needed to add data with the WT for comparison and in presence of specific blockers. 

      We respectfully disagree. As previously shown, the conductance of the flickering wild-type open state is very difficult to resolve. Our recordings do not show that the two states have different single-channel conductances, and therefore the model assumes identical singlechannel conductance. 

      The important point is that the single-channel recordings clearly show two different gating modes associated with the voltage ranges in which we predict the two open states. One has a smaller macroscopic current due to rapid flickering (aka “inactivation”). These recordings are another proof of the existence of two open states because the two gating modes occur.  Wild-type data can be found in Bauer and Schwarz, (2001, doi:10.1007/s00232-001-0031-3) or Pardo et al., (1998, doi:10.1083/jcb.143.3.767) for comparison.

      We appreciate the effort editors and reviewers invested in assessing the revised manuscript. Yet, we think that the demanded revision of experimental conditions and quantification methods contradicts the commonly accepted practice for KV10 channels. Some of the reviewer comments are skeptical about the biphasic behavior, which is an established and replicated finding for many mutants and by many researchers. The alternative explanations for these disbelieved findings are either “semantics” or cannot quantitatively explain the measurements. Therefore, only the demand for more explanations and unprecedented resolution in singlechannel recordings remains. We share these sentiments.

      ———— The following is the authors’ response to the original reviews.

      (1) The authors must show that the second open state is not just an artifact of endogenous activity but represents the activity of the same EAG channels. I suggest that the authors repeat these experiments in Mes-based solutions. 

      (2) Along the same lines, it is necessary to show that these currents can be blocked using known EAG channel blockers such as astemizole. Ultimately, it will be important to demonstrate using single-channel analysis that these do represent two distinct open states separated by a closed state. 

      We have addressed these concerns using several approaches. The most substantial change is the addition of single-channel recordings on ΔPASCap. In those experiments, we could provide evidence of the two types of events in the same patch, and the presence of an outward current at -60 mV, 50 mV below the equilibrium potential for chloride. The channels were never detected in uninjected oocytes, and Astemizole silenced the activity in patches containing multiple channels. These observations, together with the maintenance of the biphasic behavior that we interpret as evidence of the presence of O1 in methanesulfonate-based solutions, strongly suggest that both O1 and O2 obey the expression of KV10.1 mutants.

      (3) Currents should be measured by increasing the pulse lengths as needed in order to obtain the true steady-state G-V curves. 

      We agree that the endpoint of activation is ill-defined in the cases where a steady-state is not reached. This does indeed hamper quantitative statements about the relative amplitude of the two components. However, while the overall shape does change, its position (voltage dependence) would not be affected by this shortcoming. The data, therefore, supports the claim of the “existence of mutant-specific O1 and its equal voltage dependence across mutants.”

      (4) A more clear and thorough description should be provided for how the observations with the mutant channels apply to the behavior of WT channels. How exactly does state O1 relate to WT behavior, and how exactly do the parameters of the mathematical model differ between WT and mutants? How can this be interpreted at a structural level? What could be the structural mechanism through which ΔPASCap and E600R enable conduction through O1? It seems contradictory that O1 would be associated exclusively with voltage-sensor activation and not gating ring transitions, and yet the mutations that enable cation access through O1 localize at the gating ring - this needs to be better clarified. 

      We have undertaken a thorough rewriting of all sections to clarify the structural correlates that may explain the behavior of the mutants. In brief, we propose that when all four voltage sensors move towards the extracellular side, the intracellular ring maintains the permeation path closed until it rotates. If the ring is altered, this “lock” is incompetent, and permeation can be detected (page 34). By fixing the position of the ring, calmodulin would preclude permeation in the WT and promote the population of O1 in the mutants.

      (5) Rather than the t80% risetime, exponential fits should be performed to assess the kinetics of activation. 

      We agree that the assessment of kinetics by a t80% is not ideal. We originally refrained from exponential fits because they introduce other issues when used for processes that are not truly exponential (as is the case here). We had planned to perform exponential fits in this revised version, but because the activation process is not exponential, the time constants we could provide would not be accurate, and the result would remain qualitative as it is now. In the experiments where we did perform the fits (Fig. 3), the values obtained support the statement made. 

      (6) It is argued based on the G-V relations in Figure 2A that none of the mutations or deletions introduced have a major effect on state O1 properties, but rather affect state O2. However, the occupancy of state O2 is undetermined because activation curves do not reach saturation. It would be interesting to explore the fitting parameters on Fig.2B further to test whether the data on Fig 2A can indeed only be described by fits in which the parameters for O1 remain unchanged between constructs. 

      We agree that the absolute occupancy of O2 cannot be properly determined if a steady state is not reached. This is, however, a feature of the channel. During very long depolarizations in WT, the current visually appears to reach a plateau, but a closer look reveals that the current keeps increasing after very long depolarizations (up to 10 seconds; see, e.g., Fig. 1B in Garg et al., 2013, Mol Pharmacol 83, 805-813. DOI: 10.1124/mol.112.084384). Interestingly, although the model presented here does not account for this behavior, we propose changes in the model that could. “If the relative stability of O2 and C2 continued to change throughout the depolarization such a current creep-up could be reproduced. However, this would require either the introduction of further layers of On↔Cn states or a non-Markovian modification of the model’s evolution.” Page 34.

      (7) The authors interpret the results obtained with the mutants DPASCAP and E600R -tested before by Lorinczi et al. 2016, to disrupt the interactions between the PASCap and cNBHD domains- as a two-step gating mechanism with two open states. All the results obtained with the E600R mutant and DPASCap could also be explained by inactivation/recovery from inactivation behavior and a change in the equilibrium between the closed states closed/inactivated states and open states. Moreover, the small tails between +90 to +120 mV suggest channels accumulate in an inactive state (Fig 1E). It is not convincing that the two open-state model is the mechanism underlying the mutant's behavior.  

      We respectfully disagree with the notion that a single open state can provide a plausible explanation for "All the results obtained with the E600R mutant and DPASCap". We think that our new single channel results settle the question, but even without this direct evidence, a quantitative assessment of the triphasic tail currents all but excludes the possibility of a single open state. We agree that it is, in principle, possible to obtain some form of a multiphasic tail with a single open state using the scheme suggested in this comment: at the end of the test pulse, a large fraction of the channels must be accumulated in inactive states, and a few are in the open state. The hyperpolarization to -100mV then induces a rapid depopulation of the open state, followed by slower replenishments from the inactive state. Exactly this process occurs in our model, when C2 empties through O2 (Supp. 5 to Fig 9, E600R model variant). However, this alone is highly unlikely to quantitatively explain the measured tail currents, because of the drastically different time scales of the initial current decay (submillisecond to at most a few milliseconds lifetime) and the much slower transient increase in current (several tens of milliseconds) and the final decay with time constants of >100 ms (see for instance data in Fig. 1 E for E600R +50 to +120mV test pulse). To sustain the substantial magnitude of slowly decaying current by slow replenishment of an open state with a lifetime of 1 ms requires vast amounts of inactivated channels. A rough estimation based on the current integral of the initial decay and the current integral of the slowly decaying current suggests that at the end of the test pulse, the ratio inactivated/open channels would have to be 500 to 1500 for this mechanism to quantitatively explain the observed tail currents. To put this in perspective: This would suggest that without inactivation all the expressed channels in an oocyte would provide 6 mA current during the +100 mV test pulse. While theoretically possible, we consider this a less likely explanation than a second open state.

      (8) Different models should be evaluated to establish whether the results in Figure 4 can also be explained by a model in which states O1 and O2 have the same conductance. It would be desirable if the conductance of both states were experimentally determined - noise analysis could be applied to estimate the conductance of both states. 

      In the modified model, O1 and O2 have the same single-channel conductance. The small conductance combined with the fast flickering did not allow an accurate determination, but we can state that there is no evidence that the single-channel conductance of the states is different.

      (9) Although not included, it looks like the model predicts some "conventional inactivation" This can be appreciated in Fig 8, and in the traces at -60mV. Interestingly, the traces obtained in the absence of Cl- also undergo slow inactivation, or 'conventional inactivation' as referred to by the authors. Please revise the following statement "Conventional inactivation was never detected in any mutants after repeated or prolonged depolarization. In the absence of inactivation, the pre-pulse dependent current increase at +40 mV could be related to changes in the relative occupancy of the open states". 

      We have carefully edited the manuscript to address this concern. The use of the term inactivation admittedly represents a challenge. We agree that the state that results from the flickering block (C2) could be defined as “inactivated” because it is preceded by an open state. Yet, in that case, the intermediate states that the channel travels between O1 and O2 would also be sensu stricto “inactivated”, but only in the mutants. We have made this clear in page 17.

      Recommendations for improving the writing and presentation.

      (1) Methods section: Please state the reversal potential calculated for the solution used. It looks like the authors used an Instantaneous I-V curve method to calculate the reversal potential; if that's correct, please show the I-V and the traces together with the protocol used. 

      We have provided the calculated reversal potentials for excised patches. We cannot predict the reversal potential in whole oocytes because we have no control over the intracellular solution. The reversal potential was determined in the mutants through the current at the end of the stimulus because the mutants produced measurable inward currents. The differences in reversal potential were not significant among mutants.

      Pulse protocols have been added to the figures.

      (2) Figure 1 suggestion: Combine the two panels in panel D and move the F panel up so the figure gets aligned in the lower end.

      Thank you, this has been done.

      (3) Please clarify the rationale for using the E600R-specific mutant. I assume it is based on the Lorinzci et al. 2016 effect and how this is similar to the DPASCap phenotype, or is it due to the impact of this mutation in the interactions between the N-term and the cNBHD? 

      We have explained the rationale for the use of E600R explicitly on page 6.

      (4) Fig S1A is not present in the current version of the manuscript. Include a cartoon as well as a structural figure clearly depicting the perturbations introduced by E600R, ΔPASCap, and the other deletions that are tested. Additional structural information supporting the discussion would also be helpful to establish clearer mechanistic links between the experimental observations described here and the observed conformational changes between states in Kv10 channel structures. 

      We have corrected this omission, thank you for pointing it out.

      (5) It would be informative to see the traces corresponding to the I-V shown in Fig 7 A and B at the same indicated time points (0, 60, 150, and 300s). Did the authors monitor the Ca2+ signal rise after the I&T treatment to see if it coincides with the peak in the 60s? 

      In Figure 7 (now Figure 8) we used voltage ramps instead of discrete I-V protocols because of the long time required for recording the latter. This is stated on page 19. Ca2+ was monitored through Cl- current after ionomycin/thapsigargin. The duration of the Ca2+ increase was reproducible among oocytes and in good agreement with the changes observed in the biphasic behavior of the mutants (Supplement 1 to Figure 8).

      (6) Fig 4. Please state in the legend what the different color traces correspond to in E600R and DPASCap. Is there a reason to change the interpulse on DPASCap to -20mV and not allow this mutant to close? Please state. How do the authors decide the 10 ms interval for the experiments in Fig 2? 

      Thank you for pointing this out, we have added the description. We have explained why we use a different protocol for ΔPASCap and the reason for using 10 ms interval (we believe the referee means Figure 4) on page 12.  

      (7) Fig. 5. Since the pre-pulse is supposed to be 5s, but the time scale doesn't correspond with a pre-pulse of 5 s before the test pulse to +40mV. Has the pre-pulse been trimmed for representation purposes? If so, please state. 

      The pre-pulse was 5s, but as the reviewer correctly supposed, the trace is trimmed to keep the +40 mV stimulus visible. This has now been clearly stated in the legend.

      (8) The mutant L322H is located within the S4 helix according to the Kv10.1 structure (PDB 5K7L), not in the 'S3-S4 linker'; please correct. 

      This has been done, thank you.

      The introduction of this mutant should also shift the voltage dependence toward more hyperpolarizing potentials (around 30mV, according to Schoenherr et al. 1999). It looks like that shift is present within the first component of the G-V. Still, since the max amplitude from the second component could be contaminated by endogenous Cl- currents, this effect is minimized. Repeating these experiments in the no Cl- solutions will help clarify this point and see the effect of the DPASCap and E600R in the background of a mutation that accelerates the transitions between the closed states (see Major comment 1). Did the authors record L322H alone for control purposes? 

      We have decided not to measure L322H alone or repeat the measurements in Cl--free solutions because we do not see a way to use the quantitative assessment of the voltage dependence of L322H and the L322H-variants of the eag domain mutants. Like in our answer to main point 3, we base our arguments not on the precise voltage dependence of the second component but on the shape of the G-V curves instead, specifically the consistent appearance of the first component and the local conductance minimum between the first and second components. After the introduction of L322H the first component is essentially absent.

      We think that the measurements of the L322H mutants cannot be interpreted as a hyperpolarizing shift in the first component. The peak of the first conductance component occurs around -20 mV in ΔPASCap and E600R (Fig. 7 C, D). After a -30mV shift, in L322H+DPASCap and L322H+E600R, this first peak would still be detected within the voltage range in our experiments, but it is not. A contamination of the second component would have little impact on this observation, which is why we refrain from the suggested measurements.  

      (9) The authors differentiate between an O1 vs. O2 state with different conductances, and maybe I missed it, but there's no quantitative distinction between the components; how are they different?

      Please see the response to the main comments 1 and 2. This has been addressed in singlechannel recordings.

      (10) Please state the voltage protocols, holding voltages, and the solutions (K+ concentration and Cl-presence/absence) used for the experiments presented in the legends on the figures. Hence, it's easier to interpret the experiments presented. 

      Thank you, this has been done.

      (11) The authors state on page 7 that "with further depolarizations, the conductance initially declined to rise again in response to strong depolarizations. This finding matches the changes in amplitude of the tail currents, which, therefore, probably reflect a true change in conductance" However, the tails in the strong voltage range (+50 to +120 mV) for the E600R mutant argue against this result. Please review.

      The increase in the amplitude of the tail current is also present in E600R, but the relative increase is smaller. We have decided against rescaling these traces because the Figure is already rather complex. We indicated this fact with a smaller arrow and clarified it in the text (page 8).

      (12) The authors mention that the threshold of activation for the WT is around -20mV; however, the foot of the G-V is more around -30 or -40mV. Please revise. 

      Thank you. We have done this. 

      (13) The authors state on page 9 that the 'second component occurs at progressively more depolarized potentials for increasingly larger N-terminal deletions" However E600R mutant that conserves the N-terminal intact has a shift as pronounced as the DPASCap and larger than the D2-10. How do the authors interpret this result? 

      We have corrected this statement in page 10 : “…the second component occurs at progressively more depolarized potentials for increasingly larger N-terminal deletions and when the structure of the ring is altered through disruption of the interaction between N- and C-termini (E600R)”.

      (14) The equation defined to fit the G-Vs, can also be used to describe the WT currents. If the O1 is conserved and present in the WT, this equation should also fit the WT data properly. The 1-W component shown could also be interpreted as an inactivating component that, in the WT, shifts the voltage-dependence of activation towards depolarizing potentials and is not visible. Still, the mutants do show it as if the transition from closed-inactivated states is controlled by interactions in the gating ring, and disturbing them does affect the transitions to the open state. 

      Out of the two open states in the mutant, O2 is the one that shares properties with the WT (e.g. it is inaccessible during Ca2+-CaM binding) while O1 is the open state with the voltage dependence that is conserved across the mutants. We, therefore, believe that this question is based on a mix-up of the two open states. We appreciate the core of the question: does the pattern in the mutants’ G-V curves find a continuation in the WT channel? 

      Firstly, the component that is conserved among mutants does not lead to current in the WT because the corresponding open state (O1) is not observed in WT. However, the gating event represented by this component should also occur in WT and –given its apparent insensitivity to eag domain mutations–  this gating step should occur in WT with the same voltage dependence as in all the mutants. This means that this first component sets a hard boundary for the most hyperpolarized G-V curve we can expect in the WT, based on our mutant measurements. Secondly, the second component shows a regular progression across mutants: The more intact the eag domain is, the more hyperpolarized the Vhalf values of transition term (1-W) and O2 activation. In Δ2-10, the transition term already almost coincides with O1 activation (estimated Vhalf values of -33.57 and -33.47 mV). A further shift of (1-W) in the WT is implausible because, if O1 activation is coupled to the earliest VSD displacement, the transition should not occur before O1 activation. Still, the second component might shift to more hyperpolarized values in the WT, depending on the impact of amino acids 2 to 10 on the second VSD transition.

      In summary, in WT the G-V should not be more hyperpolarized than the first component of the mutants, and the (1-W)-component probably corresponds to the Δ2-10 (1-W)-component. In WT the second component should be no more depolarized than the second component of Δ2-10. The WT G-V (Fig.1B) meets all these predictions derived from the pattern in the mutant GVs: When we use Eq. 4 to fit the WT G-V with A1=0 (O1 is not present in WT) and the parameters of the transition term (1-W)  fixed to the values attained in Δ2-10, we obtain a fit for the O2 component with Vhalf\=+21mV. This value nicely falls into the succession of Vhalf values for Δeag, ΔPASCap, and Δ2-10 (+103mV,+80mV,+52mV) and, at the same time, it is not more hyperpolarized than the conserved first component (Vhalf -34mV). Our measurements therefore support that the O2 component in the mutants corresponds to the single open state in the WT. 

      (15) Page 15, the authors state that 'The changes in amplitude and kinetics in response to rising intracellular Ca2+ support our hypothesis that Ca-CaM stabilized O1, possibly by driving the channels to deep closed states (Fig 5 and 6)' (pg 15). This statement seems contradictory; I can't quite follow the rationale since Ca2+ potentiates the current (Fig 7), and the addition of the L322H mutant in Fig 7 makes the shift of the first component to negative potentials visible.

      Please check the rationale for this section. 

      We have explained this more explicitly in the discussion (page 32). “Because access to O1 occurs from deep closed states, this could be explained by an increased occupancy of such deactivated states in response to CaM binding. This appears to be the case since CaM induces a biphasic behavior in the mutant channels that show reduced access to deep closed states; thus, L322H mutants behave like the parental variants in the presence of Ca2+-CaM. This implies a mechanistic explanation for the effect of Ca2+-CaM on WT since favoring entry into deep closed states would result in a decrease in current amplitude in the absence of (a permeable) O1”.

      Also, Figs 5 and 6 seem miscited here. 

      Thank you, we have corrected this.

      (16) For Figure 5, it would be helpful if each of the current traces corresponding to a particular voltage had a different color. That way, it will be easier to see how the initial holding voltage modulates current. 

      We have considered this suggestion, and we agree that it would make it easier to follow. Yet, since we have identified the mutants with different colors, it would be inconsistent if we used another color palette for this Figure. Supplement 3 to Figure 9 shows the differences in a clearer way.

      (17) Add zero-current levels to all current traces.

      We have done this.

      (18) The mathematical model should be described better. Particularly, the states from which O1 can be accessed should be described more clearly, as well as whether the model considers any direct connectivity between states O1 and O2. The origin of the voltage-dependence for transitions that do not involve voltage-sensor movements should be discussed. Also, it separation of kappa into kappa-l and kappa-r should be described. 

      We have extensively rewritten the description of the mathematical model to address these concerns.

      (19) Page 4, "reveals a pre-open state in which the transmembrane regions of the channel are compatible with ion permeation, but is still a nonconducting state". Also, page 27, "renders a hydrophobic constriction wider than 8 Å, enough to allow K+ flow, but still corresponds to a non-conducting state". These sentences are confusing - how can the regions be compatible with ion permeation, and still not be conducting? Is cation conductance precluded by a change in the filter, or elsewhere? How is it established that it represents a non-conducting state? 

      We have rephrased to clarify this apparent inconsistence. Page 4: “(…) in which the transmembrane regions of the channel are compatible with ion permeation (the permeation path is dilated, like in open states) but the intracellular gate is still in the same conformation as in closed states (Zhang et al., 2023).” Page 31: “The presence of an intact intracellular ring would preclude ionic flow in the WT, and its alteration would explain the permeability of this state in the mutants.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      fMRI was used to address an important aspect of human cognition - the capacity for structured representations and symbolic processing - in a cross-species comparison with non-human primates (macaques); the experimental design probed implicit symbolic processing through reversal of learned stimulus pairs. The authors present solid evidence in humans that helps elucidate the role of brain networks in symbolic processing, however the evidence from macaques was incomplete (e.g., sample size constraints, potential and hard-to-quantify differences in attention allocation, motivation, and lived experience between species).

      Thank you very much for your assessment. We would like to address the potential issues that you raise point-by-point below.

      We agree that for macaque monkey physiology, sample size is always a constraint, due to both financial and ethical reasons. We addressed this concern by combining the results from two different labs, which allowed us to test 4 animals in total, which is twice as much as what is common practice in the field of primate physiology. (We discuss this now on lines 473-478.)

      Interspecies differences in motivation, attention allocation, task strategies etc. could also be limiting factors. Note that we did address the potential lack of attention allocation directly in Experiment 2 using implicit reward association, which was successful as evidenced by the activation of attentional control areas in the prefrontal cortex. We cannot guarantee that the strategies that the two species deploy are identical, but we tentatively suggest that this might be a less important factor in the present study than in other interspecies comparisons that use explicit behavioral reports. In the current study, we directly measured surprise responses in the brain in the absence of any explicit instructions in either species, which allowed us to  measure the spontaneous reversal of learned associations, which is a very basic element of symbolic representation. Our reasoning is that such spontaneous responses should be less dependent on attention allocation and task strategies. (We discuss this now in more detail on lines 478-485.)

      Finally, lived experience could be a major factor. Indeed, obvious differences include a lifetime of open-field experiences and education in our human adult subjects, which was not available to the monkey subjects, and includes a strong bias towards explicit learning of symbolic systems (e.g. words, letters, digits, etc). However, we have previously shown that 5-month-old human infants spontaneously generalize learning to the reversed pairs after a short learning in the lab using EEG (Kabdebon et al, PNAS, 2019). This indicates that also with very limited experience, humans spontaneously reverse learned associations. (We discuss this now in more detail on lines 478-485.) It could be very interesting to investigate whether spontaneous reversal could be present in infant macaque monkeys, as there might be a critical period for this effect. Although neurophysiology in awake infant monkeys is highly challenging, it would be very relevant for future work. (We discuss this in more detail on lines 493-498.)

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Kerkoerle and colleagues present a very interesting comparative fMRI study in humans and monkeys, assessing neural responses to surprise reactions at the reversal of a previously learned association. The implicit nature of this task, assessing how this information is represented without requiring explicit decision-making, is an elegant design. The paper reports that both humans and monkeys show neural responses across a range of areas when presented with incongruous stimulus pairs. Monkeys also show a surprise response when the stimuli are presented in a reversed direction. However, humans show no such surprise response based on this reversal, suggesting that they encode the relationship reversibly and bidirectionally, unlike the monkeys. This has been suggested as a hallmark of symbolic representation, that might be absent in nonhuman animals. 

      I find this experiment and the results quite compelling, and the data do support the hypothesis that humans are somewhat unique in their tendency to form reversible, symbolic associations. I think that an important strength of the results is that the critical finding is the presence of an interaction between congruity and canonicity in macaques, which does not appear in humans. These results go a long way to allay concerns I have about the comparison of many human participants to a very small number of macaques. 

      We thank the reviewer for the positive assessment. We also very much appreciate the point about the interaction effect in macaque monkeys – indeed, we do not report just a negative finding. 

      I understand the impossibility of testing 30+ macaques in an fMRI experiment. However, I think it is important to note that differences necessarily arise in the analysis of such datasets. The authors report that they use '...identical training, stimuli, and whole-brain fMRI measures'. However, the monkeys (in experiment 1) actually required 10 times more training. 

      We agree that this description was imprecise. We have changed it to “identical training stimuli” (line 151), indeed the movies used for training were strictly identical. Furthermore, please note that we do report the fMRI results after the same training duration. In experiment 1, after 3 days of training, the monkeys did not show any significant results, even in the canonical direction. However, in experiment 2, with increased attention and motivation, a significant effect was observed on the first day of scanning after training, as was found in human subjects (see Figure 4 and Table 3).

      More importantly, while the fMRI measures are the same, group analysis over 30+ individuals is inherently different from comparing only 2 macaques (including smoothing and averaging away individual differences that might be more present in the monkeys, due to the much smaller sample size). 

      Thank you for understanding that a limited sampling size is intrinsic to macaque monkey physiology. We also agree that data analysis in humans and monkeys is necessarily different. As suggested by the reviewer, we added an analysis to address this, see the corresponding reply to the ‘Recommendations for the authors’ section below.

      Despite this, the results do appear to show that macaques show the predicted interaction effect (even despite the sample size), while humans do not. I think this is quite convincing, although had the results turned out differently (for example an effect in humans that was absent in macaques), I think this difference in sample size would be considerably more concerning. 

      Thank you for noting this. Indeed, the interaction effect is crucial, and the task design was explicitly made to test this precise prediction, described in our manuscript as the “reversibility hypothesis”. The congruity effect in the learned direction served as a control for learning, while the corresponding congruity effect in the reversed direction tested for spontaneous reversal. The reversibility hypothesis stipulates that in humans there should not be a difference between the learned and the reversed direction, while there should be for monkeys. We already wrote about that in the result section of the original manuscript and now also describe this more explicitly in the introduction and beginning of the result section.

      I would also note that while I agree with the authors' conclusions, it is notable to me that the congruity effect observed in humans (red vs blue lines in Fig. 2B) appears to be far more pronounced than any effect observed in the macaques (Fig. 3C-3). Again, this does not challenge the core finding of this paper but does suggest methodological or possibly motivational/attentional differences between the humans and the monkeys (or, for example, that the monkeys had learned the associations less strongly and clearly than the humans). 

      As also explained in response to the eLife assessment above, we expanded the “limitations” section of the discussion, with a deeper description of the possible methodological differences between the two species (see lines 478-485).

      With the same worry in mind, we did increase the attention and motivation of monkeys in experiment 2, and indeed obtained a greater activation to the canonical pairs and their violation, -notably in the prefrontal cortex – but crucially still without reversibility.

      In the end, we believe that the striking interspecies difference in size and extent of the violation effect, even for purely canonical stimuli, is an important part of our findings and points to a more efficient species-specific learning system, that our experiment tentatively relates to a symbolic competence.

      This is a strong paper with elegant methods and makes a worthwhile contribution to our understanding of the neural systems supporting symbolic representations in humans, as opposed to other animals. 

      We again thank the reviewer for the positive review.

      Reviewer #2 (Public Review): 

      In their article titled "Brain mechanisms of reversible symbolic reference: a potential singularity of the human brain", van Kerkoerle et al address the timely question of whether non-human primates (rhesus macaques) possess the ability for reverse symbolic inference as observed in humans. Through an fMRI experiment in both humans and monkeys, they analyzed the bold signal in both species while observing audio-visual and visual-visual stimuli pairs that had been previously learned in a particular direction. Remarkably, the findings pertaining to humans revealed that a broad brain network exhibited increased activity in response to surprises occurring in both the learned and reverse directions. Conversely, in monkeys, the study uncovered that the brain activity within sensory areas only responded to the learned direction but failed to exhibit any discernible response to the reverse direction. These compelling results indicate that the capacity for reversible symbolic inference may be unique to humans. 

      In general, the manuscript is skillfully crafted and highly accessible to readers. The experimental design exhibits originality, and the analyses are tailored to effectively address the central question at hand.

      Although the first experiment raised a number of methodological inquiries, the subsequent second experiment thoroughly addresses these concerns and effectively replicates the initial findings, thereby significantly strengthening the overall study. Overall, this article is already of high quality and brings new insight into human cognition. 

      We sincerely thank the reviewer for the positive comments. 

      I identified three weaknesses in the manuscript: 

      - One major issue in the study is the absence of significant results in monkeys. Indeed, authors draw conclusions regarding the lack of significant difference in activity related to surprise in the multidemand network (MDN) in the reverse congruent versus reverse incongruent conditions. Although the results are convincing (especially with the significant interaction between congruency and canonicity), the article could be improved by including additional analyses in a priori ROI for the MDN in monkeys (as well as in humans, for comparison). 

      First, we disagree with the statement about “absence of significant results in monkeys”. We do report a significant interaction which, as noted by the referee, is a crucial positive finding.

      Second, we performed the suggested analysis for experiment 2, using the bilateral ROIs of the putative monkey MDN from previous literature (Mitchell, et al. 2016), which are based on the human study by Fedorenko et al. (PNAS, 2013). 

      Author response table 1.

      Congruity effect for monkeys in Experiment 2 within the ROIs of the MDN (n=3). Significance was assessed with one-sided one-sample t-tests.

      As can be seen, none of the regions within the monkey MDN showed an FDR-corrected significant difference or interaction. Although the absence of a canonical congruity effect makes it difficult to draw strong conclusions, it did approach significance at an uncorrected level in the lateral frontal posterior region, similar to  the large prefrontal effect we report in Figures 4 and 5. Furthermore, for the reversed congruity effect there was never even a trend at the uncorrected level, and the crucial interaction of canonicity and congruity again approached significance in the lateral prefrontal cortex.  

      We also performed an ANOVA  in the human participants of the VV experiment on the average betas across the 7 different fronto-parietal ROIs as used by Mitchell et al to define their equivalent to the monkey brain (Fig 1a, right in Mitchell et al. 2016) with congruity, canonicity and hemisphere (except for the anterior cingulate which is a bilateral ROI) as within-subject factors. We confirmed the results presented in the manuscript (Figure 4C) with notably no significant interaction between congruity and canonicity in any of these ROIs (all F-values (except insula) <1). A significant main effect of congruity was observed in the posterior middle frontal gyrus (MFG) and inferior precentral sulcus at the FDR corrected level. Analyses restricted to the canonical trials found a congruity effect in these two regions plus the anterior insula and anterior cingulate/presupplementary motor area, whereas no ROIs were significant at a FDR corrected level for reverse trials. There was a trend in the middle MFG and inferior precentral region for reversed trials. Crucially, there was not even a trend for the interaction between congruity and canonicity at the uncorrected level. The difference in the effect size between the canonical and reversed direction can therefore be explained by the larger statistical power due to the larger number of congruent trials (70%, versus 10% for the other trial conditions), not by a significant effect by the canonical and the reversed direction. 

      Author response table 2.

      Congruity effect for humans in Experiment 2 within the ROIs of the MDN (n=23).

      These results support our contention that the type of learning of the stimulus pairs was very different in the two species. We thank the reviewer for suggesting these relevant additional analyses.

      - While the authors acknowledge in the discussion that the number of monkeys included in the study is considerably lower compared to humans, it would be informative to know the variability of the results among human participants. 

      We agree that this is an interesting question, although it is also very open-ended. For instance, we could report each subjects’ individual whole-brain results, but this would take too much space (and the interested reader will be able to do so from the data that we make available as part of this publication). As a step in this direction, we provide below a figure showing the individual congruity effects, separately for each experiment and for each ROI of table 5, and for each of the 52 participants for whom an fMRI localizer was available:

      Author response image 1.

      Difference in mean betas between congruent and incongruent conditions in a-priori linguistic and mathematical ROIs (see definition and analyses in Table 5) in both experiments (experiment 1 = AV, left panel; experiment 2= VV, right panel). Dots correspond to participants (red: canonical trials, green reversed trials).The boxplot notch is located at the median and the lower and upper box hinges at the 25th and 75th centiles. Whiskers extend to 1.5 inter-quartile ranges on either side of the hinges. ROIs are ranked by the median of the Incongruent-Congruent difference across canonical and reversed order,

      within a given experiment. For purposes of comparison between the two experiments, we have underlined with colors the top-five common ROIs between the two experiments. N.s.: non-significant congruity effect (p>0.05)

      Several regions show a rather consistent difference across subjects (see, for instance, the posterior STS in experiment 1, left panel). Overall, only 3 of the 52 participants did not show any beta superior to 2 in canonical or reversed in any ROIs. The consistency is quite striking, given the limited number of test trials (in total only 16 incongruent trials per direction per participant), and the fact that these ROIs were selected for their responses to spoken or written  sentences, as part of a subsidiary task quite different from the main task.

      - Some details are missing in the methods.  

      Thank you for these comments, we reply to them point-by-point below.

      Reviewer #3 (Public Review): 

      This study investigates the hypothesis that humans (but not non-human primates) spontaneously learn reversible temporal associations (i.e., learning a B-A association after only being exposed to A-B sequences), which the authors consider to be a foundational property of symbolic cognition. To do so, they expose humans and macaques to 2-item sequences (in a visual-auditory experiment, pairs of images and spoken nonwords, and in a visual-visual experiment, pairs of images and abstract geometric shapes) in a fixed temporal order, then measure the brain response during a test phase to congruent vs. incongruent pairs (relative to the trained associations) in canonical vs. reversed order (relative to the presentation order used in training). The advantage of neuroimaging for this question is that it removes the need for a behavioral test, which non-human primates can fail for reasons unrelated to the cognitive construct being investigated. In humans, the researchers find statistically indistinguishable incongruity effects in both directions (supporting a spontaneous reversible association), whereas in monkeys they only find incongruity effects in the canonical direction (supporting an association but a lack of spontaneous reversal). Although the precise pattern of activation varies by experiment type (visual-auditory vs. visual-visual) in both species, the authors point out that some of the regions involved are also those that are most anatomically different between humans and other primates. The authors interpret their finding to support the hypothesis that reversible associations, and by extension symbolic cognition, is uniquely human. 

      This study is a valuable complement to prior behavioral work on this question. However, I have some concerns about methods and framing. 

      We thank the reviewer for the careful summary of the manuscript, and the positive comments.

      Methods - Design issues: 

      The authors originally planned to use the same training/testing protocol for both species but the monkeys did not learn anything, so they dramatically increased the amount of training and evaluation. By my calculation from the methods section, humans were trained on 96 trials and tested on 176, whereas the monkeys got an additional 3,840 training trials and 1,408 testing trials. The authors are explicit that they continued training the monkeys until they got a congruity effect. On the one hand, it is commendable that they are honest about this in their write-up, given that this detail could easily be framed as deliberate after the fact. On the other hand, it is still a form of p-hacking, given that it's critical for their result that the monkeys learn the canonical association (otherwise, the critical comparison to the non-canonical association is meaningless). 

      Thank you for this comment. 

      Indeed, for experiment 1, the amount of training and testing was not equal for the humans and monkeys, as also mentioned by reviewer 2. We now describe in more detail how many training and imaging days we used for each experiment and each species, as well as the number of blocks per day and the number of trials per block (see lines 572-577). We also added the information on the amount of training receives to all of the legends of the Tables.

      We are sorry for giving the impression that we trained until the monkeys learned this. This was not the case. Based on previous literature, we actually anticipated that the short training would not be sufficient, and therefore planned additional training in advance. Specifically, Meyer & Olson (2011) had observed pair learning in the inferior temporal cortex of macaque monkeys after 816 exposures per pair. This is similar to the additional training we gave, about 80 blocks with 12 trials per pair per block. This is  now explained in more detail (lines 577-580).

      Furthermore, we strongly disagree with the pejorative term p-hacking. The aim of the experiment was not to show a congruency effect in the canonical direction in monkeys, but to track and compare their behavior in the same paradigm as that of humans for the reverse direction. It would have been unwise to stop after human-identical training and only show that humans learn better, which is a given. Instead, we looked at brain activations at both times, at the end of human-identical training and when the monkeys had learned the pairs in the canonical direction. 

      Finally, in experiment 2, monkeys were tested after the same 3 days of training as humans. We wrote: “Using this design, we obtained significant canonical congruity effects in monkeys on the first imaging day after the initial training (24 trials per pair), indicating that the animals had learned the associations” (lines 252-253).

      (2) Between-species comparisons are challenging. In addition to having differences in their DNA, human participants have spent many years living in a very different culture than that of NHPs, including years of formal education. As a result, attributing the observed differences to biology is challenging. One approach that has been adopted in some past studies is to examine either young children or adults from cultures that don't have formal educational structures. This is not the approach the authors take. This major confound needs to minimally be explicitly acknowledged up front. 

      Thank you for raising this important point. We already had a section on “limitations” in the manuscript, which we now extended (line 478-485). Indeed, this study is following a previous study in 5-month-old infants using EEG, in which we already showed that after learning associations between labels and categories, infants spontaneously generalize learning to the reversed pairs after a short learning period in the lab (Kabdebon et al, PNAS, 2019). We also cited preliminary results of the same paradigm as used in the current study but using EEG in 4-month-old infants (Ekramnia and Dehaene-Lambertz, 2019), where we replicated the results obtained by Kabdebon et al. 2019 showing that preverbal infants spontaneously generalize learning to the reversed pairs. 

      Functional MRI in awake infants remains a challenge at this age (but see our own work, DehaeneLambertz et al, Science, 2002), especially because the experimental design means only a few trials in the conditions of interest (10%) and thus a long experimental duration that exceed infants’ quietness and attentional capacities in the noisy MRI environment. (We discuss this on lines 493-496.)

      (3) Humans have big advantages in processing and discriminating spoken stimuli and associating them with visual stimuli (after all, this is what words are in spoken human languages). Experiment 2 ameliorates these concerns to some degree, but still, it is difficult to attribute the failure of NHPs to show reversible associations in Experiment 1 to cognitive differences rather than the relative importance of sound string to meaning associations in the human vs. NHP experiences. 

      As the reviewer wrote, we deliberately performed Experiment 2 with visual shapes to control for various factors that might have explained the monkeys' failure in Experiment 1. 

      (4) More minor: The localizer task (math sentences vs. other sentences) makes sense for math but seems to make less sense for language: why would a language region respond more to sentences that don't describe math vs. ones that do? 

      The referee is correct: our use of the word “reciprocally” was improper (although see Amalric et Dehaene, 2016 for significant differences in both directions when non-mathematical sentences concern specific knowledge). We changed the formulation to clarify this as follows: “In these ROIs, we recovered the subject-specific coordinates of each participant’s 10% best voxels in the following comparisons: sentences vs rest for the 6 language Rois ; reading vs listening for the VWFA ; and numerical vs non-numerical sentences for the 8 mathematical ROIs.” (lines 678-680).

      Methods - Analysis issues: 

      (5) The analyses appear to "double dip" by using the same data to define the clusters and to statistically test the average cluster activation (Kriegeskorte et al., 2009). The resulting effect sizes are therefore likely inflated, and the p-values are anticonservative. 

      It is not clear to us which result the reviewer is referring to. In Tables 1-4, we report the values that we found significant in the whole brain analysis, we do not report additional statistical tests for this data. For Table 5, the subject-specific voxels were identified through a separate localizer experiment, which was designed to pinpoint the precise activation areas for each subject in the domains of oral and written language-processing and math. Subsequently, we compared the activation at these voxel locations across different conditions of the main experiment. Thus, the two datasets were distinct, and there was no double dipping. In both interpretations of the comment, we therefore disagree with the reviewer.

      Framing: 

      (6) The framing ("Brain mechanisms of reversible symbolic reference: A potential singularity of the human brain") is bigger than the finding (monkeys don't spontaneously reverse a temporal association but humans do). The title and discussion are full of buzzy terms ("brain mechanisms", "symbolic", and "singularity") that are only connected to the experiments by a debatable chain of assumptions. 

      First, this study shows relatively little about brain "mechanisms" of reversible symbolic associations, which implies insights into how these associations are learned, recognized, and represented. But we're only given standard fMRI analyses that are quite inconsistent across similar experimental paradigms, with purely suggestive connections between these spatial patterns and prior work on comparative brain anatomy. 

      We agree with the referee that the term “mechanism” is ambiguous and, for systems neuroscientists, may suggest more than we are able to do here with functional MRI. We changed the title to “Brain areas for reversible symbolic reference, a potential singularity of the human brain”. This title better describes our specific contribution: mapping out the areas involved in reversibility in humans, and showing that they do not seem to respond similarly in macaque monkeys.

      Second, it's not clear what the relationship is between symbolic cognition and a propensity to spontaneously reverse a temporal association. Certainly, if there are inter-species differences in learning preferences this is important to know about, but why is this construed as a difference in the presence or absence of symbols? Because the associations aren't used in any downstream computation, there is not even any way for participants to know which is the sign and which is the signified: these are merely labels imposed by the researchers on a sequential task. 

      As explained in the introduction, the reversibility test addressed a very minimal core property of symbolic reference. There cannot be a symbol if its attachment doesn’t operate in both directions. Thus, this property is necessary – but we agree that it is not sufficient. Indeed, more tests are needed to establish whether and how the learned symbols are used in further downstream compositional tasks (as discussed in our recent TICS papers, Dehaene et al. 2022). We added a sentence in the introduction to acknowledge this fact:

      “Such reversibility is a core and necessary property of symbols, although we readily acknowledge that it is not sufficient, since genuine symbols present additional referential and compositional properties that will not be tested in the present work.” (lines 89-92).

      Third, the word "singularity" is both problematically ambiguous and not well supported by the results. "Singularity" is a highly loaded word that the authors are simply using to mean "that which is uniquely human". Rather than picking a term with diverse technical meanings across fields and then trying to restrict the definition, it would be better to use a different term. Furthermore, even under the stated definition, this study performed a single pairwise comparison between humans and one other species (macaques), so it is a stretch to then conclude (or insinuate) that the "singularity" has been found (see also pt. 2 above). 

      We have published an extensive review including a description of our use of the term “singularity” (Dehaene et al., TICS 2022). Here is a short except: “Humans are different even in domains such as drawing and geometry that do not involve communicative language. We refer to this observation using the term “human cognitive singularity”, the word singularity being used here in its standard meaning (the condition of being singular) as well as its mathematical sense (a point of sudden change). Hominization was certainly a singularity in biological evolution, so much so that it opened up a new geological age (the Anthropocene). Even if evolution works by small continuous change (and sometimes it doesn’t [4]), it led to a drastic cognitive change in humans.”

      We find the referee’s use of the pejorative term ”insinuate” quite inappropriate. From the title on, we are quite nuanced and refer only to a “potential singularity”. Furthermore, as noted above, we explicitly mention in the discussion the limitations of our study, and in particular the fact that only a single non-human species was tested (see lines 486-493). We are working hard to get chimpanzee data, but this is remarkably difficult for us, and we hope that our paper will incite other groups to collect more evidence on this point.

      (7) Related to pt. 6, there is circularity in the framing whereby the authors say they are setting out to find out what is uniquely human, hypothesizing that the uniquely human thing is symbols, and then selecting a defining trait of symbols (spontaneous reversible association) *because* it seems to be uniquely human (see e.g., "Several studies previously found behavioral evidence for a uniquely human ability to spontaneously reverse a learned association (Imai et al., 2021; Kojima, 1984; Lipkens et al., 1988; Medam et al., 2016; Sidman et al., 1982), and such reversibility was therefore proposed as a defining feature of symbol representation reference (Deacon, 1998; Kabdebon and DehaeneLambertz, 2019; Nieder, 2009).", line 335). They can't have it both ways. Either "symbol" is an independently motivated construct whose presence can be independently tested in humans and other species, or it is by fiat synonymous with the "singularity". This circularity can be broken by a more modest framing that focuses on the core research question (e.g., "What is uniquely human? One possibility is spontaneous reversal of temporal associations.") and then connects (speculatively) to the bigger conceptual landscape in the discussion ("Spontaneous reversal of temporal associations may be a core ability underlying the acquisition of mental symbols").

      We fail to understand the putative circularity that the referee sees in our introduction. We urge him/her to re-read it, and hope that, with the changes that we introduced, it does boil down to his/her summary, i.e. “What is uniquely human? One possibility is spontaneous reversal of temporal associations."

      Reviewer #1 (Recommendations For The Authors): 

      In general, the manuscript was very clear, easy to read, and compelling. I would recommend the authors carefully check the text for consistency and minor typos. For example: 

      The sample size for the monkeys kept changing throughout the paper. E.g., Experiment 1: n = 2 (line 149); n = 3 (line 205).  

      Thank you for catching this error, we corrected it. The number of animals was indeed 2  for experiment 1, and 3 for experiment 2. (Animals JD and YS participated in experiment 1 and JD, JC and DN in experiment 2. So only JD participated in both experiments.)

      Similarly, the number of stimulus pairs is reported inconsistently (4 on line 149, 5 pairs later in the paper). 

      We’re sorry that this was unclear. We used 5 sets of 4 audio-visual pairs each. We now clarify this, on line 157 and on lines 514-516.

      At least one case of p>0.0001, rather than p < 0.0001 (I assume). 

      Thank you once again, we now corrected this.

      Reviewer #2 (Recommendations For The Authors): 

      One major issue in the study is the absence of significant results in monkeys. Indeed, the authors draw conclusions regarding the lack of significant difference in activity related to surprise in the multidemand network (MDN) in the reverse congruent versus reverse incongruent conditions. Although the results are convincing (especially with the significant interaction between congruency and canonicity), the article could be improved by including additional analyses in a priori ROI for the MDN in monkeys (as well as in humans, for comparison). In other words: what are the statistics for the MDN regarding congruity, canonicity, and interaction in both species? Since the authors have already performed this type of analysis for language and Math ROIs (table 5), it should be relatively easy for them to extend it to the MDN. Demonstrating that results in monkeys are far from significant could further convince the reader. 

      Furthermore, while the authors acknowledge in the discussion that the number of monkeys included in the study is considerably lower compared to humans, it would be informative to know the variability of the results among human participants. Specifically, it would be valuable to describe the proportion of human participants in which the effects of congruency, canonicity, and their interaction are significant. Additionally, stating the variability of the F-values for each effect would provide reassurance to the reader regarding the distinctiveness of humans in comparison to monkeys. Low variability in the results would serve to mitigate concerns that the observed disparity is merely a consequence of testing a unique subset of monkeys, which may differ from the general population. Indeed, this would be a greater support to the notion that the dissimilarity stems from a genuine distinction between the two species. 

      We responded to both of these points above.

      In terms of methods, details are missing: 

      - How many trials of each condition are there exactly? (10% of 44 trials is 4.4) : 

      We wrote: “In both humans and monkeys, each block started with 4 trials in the learned direction (congruent canonical trials), one trial for each of the 4 pairs (2 O-L and 2 L-O pairs). The rest of the block consisted of 40 trials in which 70% of trials were identical to the training; 10% were incongruent pairs but the direction (O-L or L-O) was correct (incongruent canonical trials), thus testing whether the association was learned; 10% were congruent pairs but the direction within the pairs was reversed relative to the learned pairs (congruent reversed trials) and 10% were incongruent pairs in reverse (incongruent reversed trials).”(See lines 596-600.)

      Thus, each block comprised 4 initial trials, 28 canonical congruent trials, 4 canonical incongruent, 4 reverse congruent and 4 reverse incongruent trials, i.e. 4+28+3x4=40 trials.

      - How long is one trial? 

      As written in the method section: “In each trial, the first stimulus (label or object) was presented during 700ms, followed by an inter-stimulus-interval of 100ms then the second stimulus during 700ms. The pairs were separated by a variable inter-trial-interval of 3-5 seconds” i.e. 700+100+700=1500, plus 3 to 4.75 seconds of blank between the trials (see lines 531-533).

      - How are the stimulus presentations jittered? 

      See : “The pairs were separated by a variable inter-trial-interval randomly chosen among eight different durations between 3 and 4.75 seconds (step=250 ms). The series of 8 intervals was randomized again each time it was completed.”(lines 533-535).

      - What is the statistical power achieved for humans? And for monkeys? 

      We know of no standard way to define power for fMRI experiments. Power will depend on so many parameters, including the fMRI signal-to-noise ratio, the attention of the subject, the areas being considered, the type of analysis (whole-brain versus ROIs), etc.

      - Videos are mentioned in the methods, is it the image and sound? It is not clear. 

      We’re sorry that it was unclear. Video’s were only used for the training of the human subjects. We now corrected this in the method section (lines 552-554).

      Reviewer #3 (Recommendations For The Authors): 

      The main recommendations are to adjust the framing (making it less bold and more connected to the empirical evidence) and to ensure independence in the statistical analyses of the fMRI data. 

      See our replies to the reviewer’s comments on “Framing” above. In particular, we changed the title of the paper from “Brain mechanisms of reversible symbolic reference” to “Brain areas for reversible symbolic reference”.

      References cited in this response

      Dehaene, S., Al Roumi, F., Lakretz, Y., Planton, S., & Sablé-Meyer, M. (2022). Symbols and mental programs : A hypothesis about human singularity. Trends in Cognitive Sciences, 26(9), 751‑766. https://doi.org/10.1016/j.tics.2022.06.010.

      Dehaene-Lambertz, Ghislaine, Stanislas Dehaene, et Lucie Hertz-Pannier. Functional Neuroimaging of Speech Perception in Infants. Science 298, no 5600 (2002): 2013-15. https://doi.org/10.1126/science.1077066.

      Ekramnia M, Dehaene-Lambertz G. 2019. Investigating bidirectionality of associations in young infants as an approach to the symbolic system. Presented at the CogSci. p. 3449.

      Fedorenko E, Duncan J, Kanwisher N (2013) Broad domain generality in focal regions of frontal and parietal cortex. Proc Natl Acad Sci U S A 110:16616-16621.

      Kabdebon, Claire, et Ghislaine Dehaene-Lambertz. « Symbolic Labeling in 5-Month-Old Human Infants ». Proceedings of the National Academy of Sciences 116, no 12 (2019): 5805-10. https://doi.org/10.1073/pnas.1809144116.

      Mitchell, D. J., Bell, A. H., Buckley, M. J., Mitchell, A. S., Sallet, J., & Duncan, J. (2016). A Putative Multiple-Demand System in the Macaque Brain. Journal of Neuroscience, 36(33), 8574‑8585. https://doi.org/10.1523/JNEUROSCI.0810-16.2016